digitalmars.D - Looking for champion - std.lang.d.lex
- Walter Bright (15/15) Oct 21 2010 As we all know, tool support is important for D's success. Making tools ...
- Ellery Newcomer (3/3) Oct 21 2010 and how about
- Jonathan M Davis (13/18) Oct 21 2010 That would seem like a good idea (though part of me cringes at the idea ...
- Don (6/26) Oct 22 2010 In the long term, the requirements for CTFE will be pretty much:
- Jonathan M Davis (10/32) Oct 21 2010 You mean that you're going to make someone actually pull out their compi...
- bearophile (6/8) Oct 21 2010 You may open the project here:
- Russel Winder (14/22) Oct 21 2010 Of course using BitBucket or Launchpad may well be more likely to get
- Jonathan M Davis (9/20) Oct 21 2010 I've never actually used Mercurial or Bazaar. I do use git all the time ...
- Walter Bright (4/8) Oct 21 2010 Not really, you can just use the dmd lexer source as a guide. Should be
- Jonathan M Davis (17/19) Oct 21 2010 Does this mean that you want a pseudo-port of the C++ front end's lexer ...
- Walter Bright (10/29) Oct 21 2010 Yes, but not a straight port. The C++ version has things in it that are
- Jonathan M Davis (6/43) Oct 22 2010 Okay. Good to know. I'll start looking at the C++ front end some time in...
- Lutger (4/10) Oct 22 2010 If you are gonna port from the C++ front end, there is already a port
- dolive (2/38) Oct 22 2010 dmd2.050 October will release it ? thank's
- dolive (2/22) Oct 22 2010 Do you have Scintilla for D ?
- dolive (2/27) Oct 22 2010 Should be port Scintilla to D.
- dolive (2/27) Oct 22 2010 Should be port Scintilla to D.
- BLS (6/25) Oct 22 2010 Why not creating a DLL/so based Lexer/Parser based on the existing DMD
- Walter Bright (2/6) Oct 22 2010 I've done things like that before, they're even more work.
- Jacob Carlborg (8/38) Oct 22 2010 I think it would be better to create a lexer/parser in D and have it in
- Nick Sabalausky (3/42) Oct 22 2010 *cough* DDMD
- Jacob Carlborg (6/50) Oct 23 2010 I know, I would more than love to see DDMD becoming the official D
- =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (20/39) Oct 22 2010 s =
- Walter Bright (3/11) Oct 22 2010 Lexers are so simple, it is less work to just build them by hand than us...
- Andrei Alexandrescu (4/15) Oct 22 2010 I wrote a C++ lexer. It wasn't at all easy except if I compared it
- Andrei Alexandrescu (21/55) Oct 22 2010 Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer
- Nick Sabalausky (15/38) Oct 22 2010 FWIW, I've been converting my Goldie lexing/parsing library/toolset (
- Sean Kelly (2/19) Oct 22 2010 What about, say, floating-point literals? It seems like the first eleme...
- Andrei Alexandrescu (11/30) Oct 22 2010 Yah, with regard to such regular patterns (strings, comments, numbers,
- Sean Kelly (5/43) Oct 23 2010 For the second, that may push the work of recognizing some lexical
- Sean Kelly (3/50) Oct 23 2010 Or maybe not. A /* could be CommentBegin. I'll have to think on it a bit
- Sean Kelly (4/58) Oct 23 2010 I still think it won't work. The stuff inside the comment would come
- Andrei Alexandrescu (19/62) Oct 23 2010 I was thinking comments could be easily caught by simple routines:
- Walter Bright (12/15) Oct 23 2010 I agree, a set of "canned" and heavily optimized lexing functions for co...
- Andrei Alexandrescu (7/23) Oct 23 2010 I don't see these two in tension. "General" does not need entail
- Walter Bright (6/11) Oct 23 2010 In general I agree with you, but that is a major project to do that and ...
- Sean Kelly (3/73) Oct 23 2010 Ah so the only issue is identifying the first set for a lexical element,
- Nick Sabalausky (3/65) Oct 23 2010 What's wrong with regexes? That's pretty typical for lexers.
- Andrei Alexandrescu (6/9) Oct 23 2010 I mentioned that using regexes is possible but would make it much more
- Nick Sabalausky (20/29) Oct 23 2010 I see. Maybe a lexer 2.0 thing.
- Walter Bright (2/3) Oct 23 2010 They don't handle recursion.
- Nick Sabalausky (19/22) Oct 23 2010 Neither do plain-old strings. But regexes will get you farther than plai...
- Nick Sabalausky (4/27) Oct 23 2010 And FWIW, I was already thnking about making some improvements to Goldie...
- Nick Sabalausky (5/36) Oct 23 2010 But that's all if you want generalized lexing or parsing though. If you ...
- bearophile (4/7) Oct 23 2010 Is the DDMD licence compatible with the Phobos one? Is the DDMD author(s...
- Nick Sabalausky (7/15) Oct 23 2010 I'd certainly hope so. If it isn't, then that would probably mean DMD's ...
- Denis Koroskin (3/24) Oct 24 2010 Sorry, I wasn't checking the forum. IIRC DMD license is GPL so DDMD must...
- Nick Sabalausky (5/33) Oct 24 2010 According to a random file I picked out of trunk, it's dual-licensed wit...
- Nick Sabalausky (4/40) Oct 24 2010 That does surprise me though, since I'm pretty sure Phobos is Boost Lice...
- Walter Bright (4/6) Oct 24 2010 Phobos is Boost licensed to enable maximum usage for any purpose.
- Jacob Carlborg (8/26) Oct 24 2010 As Walter wrote in the first post of this thread: "generally follow
- Walter Bright (3/5) Oct 23 2010 The problem is I never have used parser/lexer generators, so I am not re...
- Nick Sabalausky (36/41) Oct 23 2010 Understandable.
- Walter Bright (3/5) Oct 24 2010 It looks nice, but in clicking around on FAQ, documentation, getting sta...
- Nick Sabalausky (36/41) Oct 24 2010 Well, that's because that program (GOLD Parser Builder) is just a tool t...
- Walter Bright (2/2) Oct 24 2010 It looks like a solid engine, and a nice tool. Does it belong as part of...
- Walter Bright (3/4) Oct 24 2010 One question I have is how does it compare with Spirit? That would be it...
- div0 (6/10) Oct 24 2010 Spirit is a LL parser, so it's not really suitable for human edited
- Nick Sabalausky (18/22) Oct 24 2010 Can't say I'm really familiar with Spirit. From a brief lookover, these ...
- Walter Bright (4/25) Oct 24 2010 Does Goldie have (like Spirit) a set of canned routines for things like ...
- Nick Sabalausky (18/44) Oct 24 2010 No, but such things can easily be provided in the docs for simple
- Walter Bright (11/39) Oct 24 2010 In the regexp code, I provided special regexes for email addresses and U...
- Nick Sabalausky (22/33) Oct 24 2010 I'm not sure what exectly you're suggesting in these two paragraphs? (Or...
- Walter Bright (8/48) Oct 25 2010 Are all tokens returned as strings?
- Nick Sabalausky (34/42) Oct 25 2010 Goldie's lexer (and parser) are based on the GOLD system (
- Walter Bright (11/60) Oct 25 2010 Consider a string literal, say "abc\"def". With Goldie's method, I infer...
- Nick Sabalausky (60/75) Oct 25 2010 Yea, that is true. With that string in the input, the value given to the...
- Walter Bright (8/85) Oct 25 2010 Probably that's why I don't use lexer generators. Building lexers is the...
- Nick Sabalausky (30/36) Oct 26 2010 I've taken a deeper look at Spirit's docs:
- bearophile (4/5) Oct 26 2010 I have not used Spirit, but from what I have read, it doesn't scale (the...
- Nick Sabalausky (17/22) Oct 26 2010 I think that's just because it's C++ though. I'd bet a D lib that worked...
- Leandro Lucarella (11/18) Oct 26 2010 I can confirm that, at least for Spirit 1, and for simple things it
- dennis luehring (8/19) Oct 26 2010 yupp - Spirit feels right on the integration-side, but becomes more and
- dennis luehring (4/25) Oct 26 2010 that combined with compiletime-features something like the bsn-parse do
- Nick Sabalausky (41/54) Oct 26 2010 Goldie (and any GOLD-based system, really) should scale up pretty well. ...
- Jacob Carlborg (6/25) Oct 26 2010 I don't have much knowledge in this area but isn't this what a
- =?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (31/58) Oct 22 2010 ,
- Bruno Medeiros (10/49) Nov 19 2010 Agreed, of all the things desired for D, a D tokenizer would rank pretty...
- Jonathan M Davis (16/66) Nov 19 2010 We want to make it easy for tools to be built to work on and deal with D...
- Bruno Medeiros (9/68) Nov 19 2010 And by providing a lexer and a parser outside the standard library,
- Jonathan M Davis (20/90) Nov 19 2010 A,
- Bruno Medeiros (25/42) Nov 19 2010 Eh? That license argument doesn't make sense: if the lexer and parser
- Todd VanderVeen (7/10) Nov 19 2010 I agree. I do like the suggestion for developing the D grammar in Antlr ...
- Bruno Medeiros (5/15) Nov 19 2010 See the comment I made below, to Michael Stover. (
- Jonathan M Davis (50/78) Nov 19 2010 It's very different to have D implementation of something - which is bas...
- Bruno Medeiros (13/62) Nov 24 2010 There are some misunderstandings here. First, the DMD front-end is
- Andrei Alexandrescu (3/52) Nov 19 2010 Even C has strtok.
- Bruno Medeiros (6/61) Nov 24 2010 That's just a fancy splitter, I wouldn't call that a proper tokenizer. I...
- Bruno Medeiros (4/66) Nov 24 2010 In other words, a lexer, that might be a better term in this context.
- bearophile (8/14) Oct 23 2010 This is a quite long talk by Steve Yegge that I've just seen (linked fro...
- bearophile (2/4) Oct 23 2010 Sorry, the Reddit thread:
- Nick Sabalausky (4/35) Oct 23 2010 I haven't looked at the video, but that sounds like the direction I've h...
- Bruno Medeiros (31/45) Nov 24 2010 Hum, very interesting topic! A few disjoint comments:
- Andrew Wiley (5/48) Nov 24 2010 be used in this way. The Eclipse plugin for Scala (and I assume the Netb...
- Bruno Medeiros (8/52) Nov 25 2010 Interesting, very wise of them to do that.
- Nick Sabalausky (9/10) Oct 26 2010 I'm curious, is your reason for this purely to avoid allocations during
- Walter Bright (10/19) Oct 26 2010 It's one big giant reason. Storage allocation gets unbelievably costly i...
- bearophile (5/7) Oct 26 2010 Java was designed to be simple! Simple means to have a more uniform sema...
- Walter Bright (15/29) Oct 26 2010 So was Pascal. See the thread about how useless it was as a result.
- retard (9/23) Oct 27 2010 Blablabla.. this nostalgic lesson reminded me, have you even started
- Walter Bright (4/11) Oct 27 2010 If that were true, why are Java char/int/double types value types, not a...
- bearophile (12/22) Oct 27 2010 purposes, different behaviors, etc.
- Walter Bright (3/7) Oct 27 2010 So, there is "value" in value types after all. I confess I have no idea ...
- bearophile (5/7) Oct 27 2010 I am not arguing against them in absolute. They are good in some situati...
- Bruno Medeiros (22/28) Nov 19 2010 I've been hearing that a lot, but I find this to be excessively
- Bruno Medeiros (4/9) Nov 19 2010 There's good simple, and there's bad simple...
- Nick Sabalausky (67/78) Oct 26 2010 Honestly, I'm not entirely certain whether or not Goldie actually needs ...
- Walter Bright (7/9) Oct 26 2010 I use a tagged variant for the token struct.
- retard (3/15) Oct 27 2010 This is why the basic data structure in functional languages, algebraic
- Walter Bright (3/5) Oct 27 2010 I think you recently demonstrated otherwise, as proven by the widespread...
- retard (4/10) Oct 27 2010 I don't understand your logic -- Widespread use of Java proves that
- Walter Bright (7/18) Oct 27 2010 You told me that widespread use of Java proved that nothing more complex...
- retard (9/31) Oct 27 2010 I only meant that the widespead adoption of Java shows how the public at...
- Walter Bright (6/14) Oct 27 2010 Choice of a language has numerous factors, so you cannot dismiss one fac...
- retard (10/28) Oct 27 2010 I don't think I said anything that contradicts that.
- Nick Sabalausky (7/9) Oct 27 2010 The public at large is convinced that "Java is fast now, really!". So I'...
- Todd D. VanderVeen (15/17) Oct 27 2010 Legacy in the sense that C is perhaps.
- retard (14/17) Oct 27 2010 Probably the top 10 names are more or less correct there, but some funny...
- Don (6/28) Oct 28 2010 I reckon Fortran is the one to look at it. If Tiobe's stats were
- Matthias Pleh (18/46) Oct 28 2010 There was an article in the Ct-Magazin (German) where they took a closer...
- Bruno Medeiros (16/26) Nov 19 2010 Java is quickly becoming a legacy language? the next COBOL? SRSLY?...
- bearophile (4/7) Nov 19 2010 Java on Adroid is not going well, there is a Oracle->Google lawsuit in p...
- Andrew Wiley (7/13) Nov 19 2010 I have to agree with Bruno here, Java isn't going anywhere soon. It has ...
- Nick Sabalausky (16/34) Nov 23 2010 To be clear, I meant Java the language, not Java the VM. But yea, you're...
- Michael Stover (15/47) Nov 19 2010 As for D lexers and tokenizers, what would be nice is to
- Bruno Medeiros (11/24) Nov 19 2010 Yes, that would be much better. It would be directly and immediately
- Michael Stover (4/32) Nov 19 2010 so that was 4 months ago - how do things currently stand on that initiat...
- Matthias Pleh (5/34) Nov 20 2010 There is a project with an antlr D-grammar in work.
- Bruno Medeiros (15/44) Nov 24 2010 I don't know about Ellery, as you can see in that thread he/she(?)
- Ellery Newcomer (12/24) Nov 24 2010 Normally I go by 'it'.
- Bruno Medeiros (12/42) Nov 24 2010 I didn't meant to offend or anything, I was just unsure of that. To me
- Ellery Newcomer (4/9) Nov 24 2010 None taken; I'm just laughing at you. As I understand it, though,
- bearophile (4/6) Nov 24 2010 In Python newsgroups I have seen few women, now and then, but in the D n...
- Daniel Gibson (5/14) Nov 24 2010 At my university there are *very* few woman studying computer science.
- Nick Sabalausky (4/23) Nov 25 2010 See, that's the #1 worst thing about the field of programming: Total sau...
- Bruno Medeiros (12/31) Nov 26 2010 It is well know that there is a big gender gap in CS with regards to
- Bruno Medeiros (11/42) Nov 19 2010 "the widespead adoption of Java shows how the public at large cares very...
- dolive (2/22) Feb 26 2011 intense support! Someone to do it?
- Jonathan M Davis (7/33) Feb 26 2011 ed
- dolive (3/35) Feb 26 2011 thanks, make an all out effort !
As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?
Oct 21 2010
On Thursday 21 October 2010 15:12:41 Ellery Newcomer wrote:and how about 6. ctfe compatible ?That would seem like a good idea (though part of me cringes at the idea of a program specifically running the lexer (and possibly the parser) as part of its own compilation process), but for the main purpose of being used for tools for D, that would seem completely unnecessary. So, I'd say that it would be a good idea to make it CTFE-able if it is at all reasonable to do so but that if making it CTFE-able would harm the design for more typical use, then it shouldn't be made CTFE-able. Personally, I don't have a good feel for exactly what is CTFE- able though, so I have no idea how easy it would be to make it CTFE-able. However, it does seem like a good idea if it's reasonable to do so. And if it's not, hopefully as dmd's CTFE capabilities become more advanced, it will become possible to do so. - Jonathan M Davis
Oct 21 2010
Jonathan M Davis wrote:On Thursday 21 October 2010 15:12:41 Ellery Newcomer wrote:In the long term, the requirements for CTFE will be pretty much: 1. the function must be safe (eg, no asm). 2. the function must be pure 3. the compiler must have access to the source code You'll probably satisfy all those requirements anyway.and how about 6. ctfe compatible ?That would seem like a good idea (though part of me cringes at the idea of a program specifically running the lexer (and possibly the parser) as part of its own compilation process), but for the main purpose of being used for tools for D, that would seem completely unnecessary. So, I'd say that it would be a good idea to make it CTFE-able if it is at all reasonable to do so but that if making it CTFE-able would harm the design for more typical use, then it shouldn't be made CTFE-able. Personally, I don't have a good feel for exactly what is CTFE- able though, so I have no idea how easy it would be to make it CTFE-able. However, it does seem like a good idea if it's reasonable to do so. And if it's not, hopefully as dmd's CTFE capabilities become more advanced, it will become possible to do so. - Jonathan M Davis
Oct 22 2010
On Thursday, October 21, 2010 15:01:21 Walter Bright wrote:As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?You mean that you're going to make someone actually pull out their compiler book? ;) I'd love to do this (lexers and parsers are great fun IMHO - it's the code generation that isn't so fun), but I'm afraid that I'm busy enough at the moment that if I take it on, it won't get done very quickly. It is so very tempting though... So, as long as you're not in a hurry, I'm up for it, but I can't guarantee anything even approaching fast delivery. - Jonathan M Davis
Oct 21 2010
Jonathan M Davis:So, as long as you're not in a hurry, I'm up for it, but I can't guarantee anything even approaching fast delivery.You may open the project here: http://github.com/ And then other people may help you along the way. Bye, bearophile
Oct 21 2010
On Thu, 2010-10-21 at 19:51 -0400, bearophile wrote:Jonathan M Davis: =20tee=20So, as long as you're not in a hurry, I'm up for it, but I can't guaran=Of course using BitBucket or Launchpad may well be more likely to get support as Mercurial and Bazaar are so much more usable that Git. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel russel.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winderanything even approaching fast delivery.=20 You may open the project here: http://github.com/ And then other people may help you along the way.
Oct 21 2010
On Thursday, October 21, 2010 17:24:34 Russel Winder wrote:On Thu, 2010-10-21 at 19:51 -0400, bearophile wrote:I've never actually used Mercurial or Bazaar. I do use git all the time though. I quite like it. Now, it could be Mercurial or Bazaar is better (like I said, I haven't used them), but I do find git to be quite useable. The simple fact that I can just create a repository in place instead of having to set up a separate location for a repository (like you have to do with svn) is a _huge_ improvement. I didn't really use source control on my personal projects before git. git actually makes it easy enough to do so that I do it all the time now. - Jonathan M DavisJonathan M Davis:Of course using BitBucket or Launchpad may well be more likely to get support as Mercurial and Bazaar are so much more usable that Git.So, as long as you're not in a hurry, I'm up for it, but I can't guarantee anything even approaching fast delivery.You may open the project here: http://github.com/ And then other people may help you along the way.
Oct 21 2010
Jonathan M Davis wrote:You mean that you're going to make someone actually pull out their compiler book? ;)Not really, you can just use the dmd lexer source as a guide. Should be straightforward.So, as long as you're not in a hurry, I'm up for it, but I can't guarantee anything even approaching fast delivery.As long as it gets done!
Oct 21 2010
On Thursday 21 October 2010 15:01:21 Walter Bright wrote:5. generally follow along with the C++ one so that they can be maintained in tandemDoes this mean that you want a pseudo-port of the C++ front end's lexer to D for this? Or are you looking for just certain pieces of it to be similar? I haven't looked at the front end code yet, so I don't know how it works there, but I wouldn't expect it to uses ranges, for instance, so I would expect that the basic design would naturally stray a bit from whatever was done in C++ simply by doing things in fairly idiomatic D. And if I do look at the front end to see how that's done, there's the issue of the license. As I understand it, the front end is LGPL, and Phobos is generally Boost, which would mean that I would be looking at LGPL-licensed code when designing Boost-licensed, even though it wouldn't really be copying the code per se since it's a change of language (though if you did the whole front end, obviously the license issue can be waved quite easily). License issues aside, however, I do think that it would make sense for std.lang.d.lex to do things similiarly to the C++ front end, even if there are a number of basic differences. - Jonathan M Davis
Oct 21 2010
Jonathan M Davis wrote:On Thursday 21 October 2010 15:01:21 Walter Bright wrote:Yes, but not a straight port. The C++ version has things in it that are unnecessary for the D version, like the external string table (should use an associative array instead), the support for lookahead can be put in the parser, doesn't tokenize comments, etc. Essentially I'd like the D lexer to be self-contained in one file.5. generally follow along with the C++ one so that they can be maintained in tandemDoes this mean that you want a pseudo-port of the C++ front end's lexer to D for this? Or are you looking for just certain pieces of it to be similar?I haven't looked at the front end code yet, so I don't know how it works there, but I wouldn't expect it to uses ranges, for instance, so I would expect that the basic design would naturally stray a bit from whatever was done in C++ simply by doing things in fairly idiomatic D. And if I do look at the front end to see how that's done, there's the issue of the license. As I understand it, the front end is LGPL, and Phobos is generally Boost, which would mean that I would be looking at LGPL-licensed code when designing Boost-licensed, even though it wouldn't really be copying the code per se since it's a change of language (though if you did the whole front end, obviously the license issue can be waved quite easily).Since the license is mine, I can change the D version to the Boost license, no problem.License issues aside, however, I do think that it would make sense for std.lang.d.lex to do things similiarly to the C++ front end, even if there are a number of basic differences.Yup. The idea is the D version lexes exactly the same grammar as the dmd one. The easiest way to ensure that is to do equivalent logic.
Oct 21 2010
On Thursday 21 October 2010 23:55:42 Walter Bright wrote:Jonathan M Davis wrote:Okay. Good to know. I'll start looking at the C++ front end some time in the next few days, but like I said, I really don't know how much time I'm going to be able to spend on it, so it won't necessarily be quick. However, porting logic should be much faster than doing it from scratch. - Jonathan M DavisOn Thursday 21 October 2010 15:01:21 Walter Bright wrote:Yes, but not a straight port. The C++ version has things in it that are unnecessary for the D version, like the external string table (should use an associative array instead), the support for lookahead can be put in the parser, doesn't tokenize comments, etc. Essentially I'd like the D lexer to be self-contained in one file.5. generally follow along with the C++ one so that they can be maintained in tandemDoes this mean that you want a pseudo-port of the C++ front end's lexer to D for this? Or are you looking for just certain pieces of it to be similar?I haven't looked at the front end code yet, so I don't know how it works there, but I wouldn't expect it to uses ranges, for instance, so I would expect that the basic design would naturally stray a bit from whatever was done in C++ simply by doing things in fairly idiomatic D. And if I do look at the front end to see how that's done, there's the issue of the license. As I understand it, the front end is LGPL, and Phobos is generally Boost, which would mean that I would be looking at LGPL-licensed code when designing Boost-licensed, even though it wouldn't really be copying the code per se since it's a change of language (though if you did the whole front end, obviously the license issue can be waved quite easily).Since the license is mine, I can change the D version to the Boost license, no problem.License issues aside, however, I do think that it would make sense for std.lang.d.lex to do things similiarly to the C++ front end, even if there are a number of basic differences.Yup. The idea is the D version lexes exactly the same grammar as the dmd one. The easiest way to ensure that is to do equivalent logic.
Oct 22 2010
Jonathan M Davis wrote: ...Okay. Good to know. I'll start looking at the C++ front end some time in the next few days, but like I said, I really don't know how much time I'm going to be able to spend on it, so it won't necessarily be quick. However, porting logic should be much faster than doing it from scratch. - Jonathan M DavisIf you are gonna port from the C++ front end, there is already a port called ddmd which may give you a head start: www.dsource.org/projects/ddmd
Oct 22 2010
Walter Bright 写到:Jonathan M Davis wrote:dmd2.050 October will release it ? thank'sOn Thursday 21 October 2010 15:01:21 Walter Bright wrote:Yes, but not a straight port. The C++ version has things in it that are unnecessary for the D version, like the external string table (should use an associative array instead), the support for lookahead can be put in the parser, doesn't tokenize comments, etc. Essentially I'd like the D lexer to be self-contained in one file.5. generally follow along with the C++ one so that they can be maintained in tandemDoes this mean that you want a pseudo-port of the C++ front end's lexer to D for this? Or are you looking for just certain pieces of it to be similar?I haven't looked at the front end code yet, so I don't know how it works there, but I wouldn't expect it to uses ranges, for instance, so I would expect that the basic design would naturally stray a bit from whatever was done in C++ simply by doing things in fairly idiomatic D. And if I do look at the front end to see how that's done, there's the issue of the license. As I understand it, the front end is LGPL, and Phobos is generally Boost, which would mean that I would be looking at LGPL-licensed code when designing Boost-licensed, even though it wouldn't really be copying the code per se since it's a change of language (though if you did the whole front end, obviously the license issue can be waved quite easily).Since the license is mine, I can change the D version to the Boost license, no problem.License issues aside, however, I do think that it would make sense for std.lang.d.lex to do things similiarly to the C++ front end, even if there are a number of basic differences.Yup. The idea is the D version lexes exactly the same grammar as the dmd one. The easiest way to ensure that is to do equivalent logic.
Oct 22 2010
Walter Bright 写到:As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?Do you have Scintilla for D ?
Oct 22 2010
dolive 写到:Walter Bright 写到:Should be port Scintilla to D.As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?Do you have Scintilla for D ?
Oct 22 2010
dolive 写到:Walter Bright 写到:Should be port Scintilla to D.As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?Do you have Scintilla for D ?
Oct 22 2010
Why not creating a DLL/so based Lexer/Parser based on the existing DMD front end.? It could be always up to date. Necessary Steps. functional wrappers around C++ classes, Implementing the visitor pattern (AST), create std.lex and std.parse.. my 2 cents On 22/10/2010 00:01, Walter Bright wrote:As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?
Oct 22 2010
BLS wrote:Why not creating a DLL/so based Lexer/Parser based on the existing DMD front end.? It could be always up to date. Necessary Steps. functional wrappers around C++ classes, Implementing the visitor pattern (AST), create std.lex and std.parse..I've done things like that before, they're even more work.
Oct 22 2010
On 2010-10-22 17:37, BLS wrote:Why not creating a DLL/so based Lexer/Parser based on the existing DMD front end.? It could be always up to date. Necessary Steps. functional wrappers around C++ classes, Implementing the visitor pattern (AST), create std.lex and std.parse.. my 2 cents On 22/10/2010 00:01, Walter Bright wrote:I think it would be better to create a lexer/parser in D and have it in the standard library. Then one could begin the process of porting the DMD frontend using this library. Then hopefully the DMD frontend will be written in D and use this new library, being one code base and will always be up to date. -- /Jacob CarlborgAs we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?
Oct 22 2010
"Jacob Carlborg" <doob me.com> wrote in message news:i9spln$lbj$1 digitalmars.com...On 2010-10-22 17:37, BLS wrote:*cough* DDMDWhy not creating a DLL/so based Lexer/Parser based on the existing DMD front end.? It could be always up to date. Necessary Steps. functional wrappers around C++ classes, Implementing the visitor pattern (AST), create std.lex and std.parse.. my 2 cents On 22/10/2010 00:01, Walter Bright wrote:I think it would be better to create a lexer/parser in D and have it in the standard library. Then one could begin the process of porting the DMD frontend using this library. Then hopefully the DMD frontend will be written in D and use this new library, being one code base and will always be up to date.As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?
Oct 22 2010
On 2010-10-22 22:42, Nick Sabalausky wrote:"Jacob Carlborg"<doob me.com> wrote in message news:i9spln$lbj$1 digitalmars.com...I know, I would more than love to see DDMD becoming the official D compiler but if that will happen I would still like that the frontend is based on the lexer/parser library in phobos. -- /Jacob CarlborgOn 2010-10-22 17:37, BLS wrote:*cough* DDMDWhy not creating a DLL/so based Lexer/Parser based on the existing DMD front end.? It could be always up to date. Necessary Steps. functional wrappers around C++ classes, Implementing the visitor pattern (AST), create std.lex and std.parse.. my 2 cents On 22/10/2010 00:01, Walter Bright wrote:I think it would be better to create a lexer/parser in D and have it in the standard library. Then one could begin the process of porting the DMD frontend using this library. Then hopefully the DMD frontend will be written in D and use this new library, being one code base and will always be up to date.As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?
Oct 23 2010
Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com> = napisa=B3(a):As we all know, tool support is important for D's success. Making tool=s =easier to build will help with that. To that end, I think we need a lexer for the standard library - =std.lang.d.lex. It would be helpful in writing color syntax highlighti=ng =filters, pretty printers, repl, doc generators, static analyzers, and ==even D compilers. It should: 1. support a range interface for its input, and a range interface for ==its output 2. optionally not generate lexical errors, but just try to recover and==continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be =maintained in tandem It can also serve as the basis for creating a javascript implementatio=n =that can be embedded into web pages for syntax highlighting, and =eventually an std.lang.d.parse. Anyone want to own this?Interesting idea. Here's another: D will soon need bindings for CORBA, = Thrift, etc, so lexers will have to be written all over to grok interfac= e = files. Perhaps a generic tokenizer which can be parametrized with a = lexical grammar would bring more ROI, I got a hunch D's templates are = strong enough to pull this off without any source code generation ala = JavaCC. The books I read on compilers say tokenization is a solved = problem, so the theory part on what a good abstraction should be is done= . = What you think? -- Tomek
Oct 22 2010
Tomek Sowi駍ki wrote:Interesting idea. Here's another: D will soon need bindings for CORBA, Thrift, etc, so lexers will have to be written all over to grok interface files. Perhaps a generic tokenizer which can be parametrized with a lexical grammar would bring more ROI, I got a hunch D's templates are strong enough to pull this off without any source code generation ala JavaCC. The books I read on compilers say tokenization is a solved problem, so the theory part on what a good abstraction should be is done. What you think?Lexers are so simple, it is less work to just build them by hand than use lexer generator tools.
Oct 22 2010
On 10/22/10 14:17 CDT, Walter Bright wrote:Tomek Sowi艅ski wrote:I wrote a C++ lexer. It wasn't at all easy except if I compared it against the work necessary to build a full compiler. AndreiInteresting idea. Here's another: D will soon need bindings for CORBA, Thrift, etc, so lexers will have to be written all over to grok interface files. Perhaps a generic tokenizer which can be parametrized with a lexical grammar would bring more ROI, I got a hunch D's templates are strong enough to pull this off without any source code generation ala JavaCC. The books I read on compilers say tokenization is a solved problem, so the theory part on what a good abstraction should be is done. What you think?Lexers are so simple, it is less work to just build them by hand than use lexer generator tools.
Oct 22 2010
On 10/22/10 14:02 CDT, Tomek Sowi艅ski wrote:Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com> napisa艂(a):Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer generator. I have in mind the entire implementation of a simple design, but never had the time to execute on it. The tokenizer would work like this: alias Lexer!( "+", "PLUS", "-", "MINUS", "+=", "PLUS_EQ", ... "if", "IF", "else", "ELSE" ... ) DLexer; Such a declaration generates numeric values DLexer.PLUS etc. and generates an efficient code that extracts a stream of tokens from a stream of text. Each token in the token stream has the ID and the text. Comments, strings etc. can be handled in one of several ways but that's a longer discussion. The undertaking is doable but nontrivial. AndreiAs we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?Interesting idea. Here's another: D will soon need bindings for CORBA, Thrift, etc, so lexers will have to be written all over to grok interface files. Perhaps a generic tokenizer which can be parametrized with a lexical grammar would bring more ROI, I got a hunch D's templates are strong enough to pull this off without any source code generation ala JavaCC. The books I read on compilers say tokenization is a solved problem, so the theory part on what a good abstraction should be is done. What you think?
Oct 22 2010
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message news:i9spsa$ll0$1 digitalmars.com...On 10/22/10 14:02 CDT, Tomek Sowinski wrote:FWIW, I've been converting my Goldie lexing/parsing library/toolset ( http://www.dsource.org/projects/goldie ) to D2/Phobos, and that should have a release sometime in the next couple months or so. I'm not sure it would really be appropriate for Phobos since it's pretty range-ified yet, probably doesn't use Phobos coding conventions, and relies on one of my other libraries/tools. But it does do generalized lexing/parsing (LALR) via the GOLD ( http://www.devincook.com/goldparser/ ) grammar file formats, can optionally generate source files for better compile-time checking (for instance, so Token!"<Statemnt>" will generate a compile-time error), has full documentation, and I'm working on a tool/lib that will compile the grammars without having to use the Windows/GUI-based GOLD Parser Builder tool.Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com> napisal(a):Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer generator.As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers.Interesting idea. Here's another: D will soon need bindings for CORBA, Thrift, etc, so lexers will have to be written all over to grok interface files. Perhaps a generic tokenizer which can be parametrized with a lexical grammar would bring more ROI, I got a hunch D's templates are strong enough to pull this off without any source code generation ala JavaCC. The books I read on compilers say tokenization is a solved problem, so the theory part on what a good abstraction should be is done. What you think?
Oct 22 2010
Andrei Alexandrescu Wrote:I have in mind the entire implementation of a simple design, but never had the time to execute on it. The tokenizer would work like this: alias Lexer!( "+", "PLUS", "-", "MINUS", "+=", "PLUS_EQ", ... "if", "IF", "else", "ELSE" ... ) DLexer; Such a declaration generates numeric values DLexer.PLUS etc. and generates an efficient code that extracts a stream of tokens from a stream of text. Each token in the token stream has the ID and the text.What about, say, floating-point literals? It seems like the first element of a pair might have to be a regex pattern.
Oct 22 2010
On 10/22/10 16:28 CDT, Sean Kelly wrote:Andrei Alexandrescu Wrote:Yah, with regard to such regular patterns (strings, comments, numbers, identifiers) there are at least two possibilities that I see: 1. Go the full route of allowing regexen in the definition. This is very hard because you need to generate an efficient (N|D)FA during compilation. 2. Pragmatically allow "fallthrough" routines, i.e. if nothing in the compile-time table matches, just call onUnrecognizedString(). In conjunction with a few simple specialized functions, that makes it very simple to define arbitrarily complex lexers where the bulk of the work (and the most tedious part) is done by the D compiler. AndreiI have in mind the entire implementation of a simple design, but never had the time to execute on it. The tokenizer would work like this: alias Lexer!( "+", "PLUS", "-", "MINUS", "+=", "PLUS_EQ", ... "if", "IF", "else", "ELSE" ... ) DLexer; Such a declaration generates numeric values DLexer.PLUS etc. and generates an efficient code that extracts a stream of tokens from a stream of text. Each token in the token stream has the ID and the text.What about, say, floating-point literals? It seems like the first element of a pair might have to be a regex pattern.
Oct 22 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 10/22/10 16:28 CDT, Sean Kelly wrote:For the second, that may push the work of recognizing some lexical elements into the parser. For example, a comment may be defined as /**/, which if there is no lexical definition of a comment means that it parses as four distinct valid tokens, div mul mul div.Andrei Alexandrescu Wrote:Yah, with regard to such regular patterns (strings, comments, numbers, identifiers) there are at least two possibilities that I see: 1. Go the full route of allowing regexen in the definition. This is very hard because you need to generate an efficient (N|D)FA during compilation. 2. Pragmatically allow "fallthrough" routines, i.e. if nothing in the compile-time table matches, just call onUnrecognizedString(). In conjunction with a few simple specialized functions, that makes it very simple to define arbitrarily complex lexers where the bulk of the work (and the most tedious part) is done by the D compiler.I have in mind the entire implementation of a simple design, but never had the time to execute on it. The tokenizer would work like this: alias Lexer!( "+", "PLUS", "-", "MINUS", "+=", "PLUS_EQ", ... "if", "IF", "else", "ELSE" ... ) DLexer; Such a declaration generates numeric values DLexer.PLUS etc. and generates an efficient code that extracts a stream of tokens from a stream of text. Each token in the token stream has the ID and the text.What about, say, floating-point literals? It seems like the first element of a pair might have to be a regex pattern.
Oct 23 2010
Sean Kelly <sean invisibleduck.org> wrote:Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Or maybe not. A /* could be CommentBegin. I'll have to think on it a bit more.On 10/22/10 16:28 CDT, Sean Kelly wrote:For the second, that may push the work of recognizing some lexical elements into the parser. For example, a comment may be defined as /**/, which if there is no lexical definition of a comment means that it parses as four distinct valid tokens, div mul mul div.Andrei Alexandrescu Wrote:Yah, with regard to such regular patterns (strings, comments, numbers, identifiers) there are at least two possibilities that I see: 1. Go the full route of allowing regexen in the definition. This is very hard because you need to generate an efficient (N|D)FA during compilation. 2. Pragmatically allow "fallthrough" routines, i.e. if nothing in the compile-time table matches, just call onUnrecognizedString(). In conjunction with a few simple specialized functions, that makes it very simple to define arbitrarily complex lexers where the bulk of the work (and the most tedious part) is done by the D compiler.I have in mind the entire implementation of a simple design, but never had the time to execute on it. The tokenizer would work like this: alias Lexer!( "+", "PLUS", "-", "MINUS", "+=", "PLUS_EQ", ... "if", "IF", "else", "ELSE" ... ) DLexer; Such a declaration generates numeric values DLexer.PLUS etc. and generates an efficient code that extracts a stream of tokens from a stream of text. Each token in the token stream has the ID and the text.What about, say, floating-point literals? It seems like the first element of a pair might have to be a regex pattern.
Oct 23 2010
Sean Kelly <sean invisibleduck.org> wrote:Sean Kelly <sean invisibleduck.org> wrote:I still think it won't work. The stuff inside the comment would come through as a string of random tokens. Also, the // comment is EOL sensitive, and this info Ian normally communicated to the parser.Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Or maybe not. A /* could be CommentBegin. I'll have to think on it a bit more.On 10/22/10 16:28 CDT, Sean Kelly wrote:For the second, that may push the work of recognizing some lexical elements into the parser. For example, a comment may be defined as /**/, which if there is no lexical definition of a comment means that it parses as four distinct valid tokens, div mul mul div.Andrei Alexandrescu Wrote:Yah, with regard to such regular patterns (strings, comments, numbers, identifiers) there are at least two possibilities that I see: 1. Go the full route of allowing regexen in the definition. This is very hard because you need to generate an efficient (N|D)FA during compilation. 2. Pragmatically allow "fallthrough" routines, i.e. if nothing in the compile-time table matches, just call onUnrecognizedString(). In conjunction with a few simple specialized functions, that makes it very simple to define arbitrarily complex lexers where the bulk of the work (and the most tedious part) is done by the D compiler.I have in mind the entire implementation of a simple design, but never had the time to execute on it. The tokenizer would work like this: alias Lexer!( "+", "PLUS", "-", "MINUS", "+=", "PLUS_EQ", ... "if", "IF", "else", "ELSE" ... ) DLexer; Such a declaration generates numeric values DLexer.PLUS etc. and generates an efficient code that extracts a stream of tokens from a stream of text. Each token in the token stream has the ID and the text.What about, say, floating-point literals? It seems like the first element of a pair might have to be a regex pattern.
Oct 23 2010
On 10/23/10 11:44 CDT, Sean Kelly wrote:Andrei Alexandrescu<SeeWebsiteForEmail erdani.org> wrote:I was thinking comments could be easily caught by simple routines: alias Lexer!( "+", "PLUS", "-", "MINUS", "+=", "PLUS_EQ", ... "/*", q{parseNonNestedComment("*/")}, "/+", q{parseNestedComment("+/")}, "//", q{parseOneLineComment()}, ... "if", "IF", "else", "ELSE", ... ) DLexer; During compilation, such non-tokens are recognized as code by the lexer generator and called appropriately. A comprehensive library of such routines completes a useful library. AndreiOn 10/22/10 16:28 CDT, Sean Kelly wrote:For the second, that may push the work of recognizing some lexical elements into the parser. For example, a comment may be defined as /**/, which if there is no lexical definition of a comment means that it parses as four distinct valid tokens, div mul mul div.Andrei Alexandrescu Wrote:Yah, with regard to such regular patterns (strings, comments, numbers, identifiers) there are at least two possibilities that I see: 1. Go the full route of allowing regexen in the definition. This is very hard because you need to generate an efficient (N|D)FA during compilation. 2. Pragmatically allow "fallthrough" routines, i.e. if nothing in the compile-time table matches, just call onUnrecognizedString(). In conjunction with a few simple specialized functions, that makes it very simple to define arbitrarily complex lexers where the bulk of the work (and the most tedious part) is done by the D compiler.I have in mind the entire implementation of a simple design, but never had the time to execute on it. The tokenizer would work like this: alias Lexer!( "+", "PLUS", "-", "MINUS", "+=", "PLUS_EQ", ... "if", "IF", "else", "ELSE" ... ) DLexer; Such a declaration generates numeric values DLexer.PLUS etc. and generates an efficient code that extracts a stream of tokens from a stream of text. Each token in the token stream has the ID and the text.What about, say, floating-point literals? It seems like the first element of a pair might have to be a regex pattern.
Oct 23 2010
Andrei Alexandrescu wrote:During compilation, such non-tokens are recognized as code by the lexer generator and called appropriately. A comprehensive library of such routines completes a useful library.I agree, a set of "canned" and heavily optimized lexing functions for common things like identifiers, numbers, comments, etc., would make a lexing library much more practical. Those will work great for inventing DSLs, but for existing languages, the trouble is that the different languages have subtle variations on how they handle them. For example, D's numeric literals allow embedded underscores. Go doesn't overflow on numeric literals. Javascript has some wacky rules to distinguish a comment from a regex. The \uNNNN letters allowed in identifiers in some languages. So while a general purpose lexing library will be very useful, for lexing D code (and Java, Javascript, etc.) a custom one will probably be much more practical.
Oct 23 2010
On 10/23/10 13:41 CDT, Walter Bright wrote:Andrei Alexandrescu wrote:I don't see these two in tension. "General" does not need entail "unsuitable for subtle particularities". It is more difficult, but not impossible. Again, a general parser that takes care of the 90% of the drudgework and gives enough hooks to do the remaining 10%, all as efficient as hand-written code. AndreiDuring compilation, such non-tokens are recognized as code by the lexer generator and called appropriately. A comprehensive library of such routines completes a useful library.I agree, a set of "canned" and heavily optimized lexing functions for common things like identifiers, numbers, comments, etc., would make a lexing library much more practical. Those will work great for inventing DSLs, but for existing languages, the trouble is that the different languages have subtle variations on how they handle them. For example, D's numeric literals allow embedded underscores. Go doesn't overflow on numeric literals. Javascript has some wacky rules to distinguish a comment from a regex. The \uNNNN letters allowed in identifiers in some languages. So while a general purpose lexing library will be very useful, for lexing D code (and Java, Javascript, etc.) a custom one will probably be much more practical.
Oct 23 2010
Andrei Alexandrescu wrote:I don't see these two in tension. "General" does not need entail "unsuitable for subtle particularities". It is more difficult, but not impossible. Again, a general parser that takes care of the 90% of the drudgework and gives enough hooks to do the remaining 10%, all as efficient as hand-written code.In general I agree with you, but that is a major project to do that and make it general, efficient, and easy to use - and then, one has to make a D lexer out of it. In the meantime, we have a lexer for D that would be straightforward to adapt to be a D library module. The only decisions that have to be made is what the API to it will be.
Oct 23 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 10/23/10 11:44 CDT, Sean Kelly wrote:Ah so the only issue is identifying the first set for a lexical element, is essence. That works.Andrei Alexandrescu<SeeWebsiteForEmail erdani.org> wrote:I was thinking comments could be easily caught by simple routines: alias Lexer!( "+", "PLUS", "-", "MINUS", "+=", "PLUS_EQ", ... "/*", q{parseNonNestedComment("*/")}, "/+", q{parseNestedComment("+/")}, "//", q{parseOneLineComment()}, ... "if", "IF", "else", "ELSE", ... ) DLexer; During compilation, such non-tokens are recognized as code by the lexer generator and called appropriately. A comprehensive library of such routines completes a useful library.On 10/22/10 16:28 CDT, Sean Kelly wrote:For the second, that may push the work of recognizing some lexical elements into the parser. For example, a comment may be defined as /**/, which if there is no lexical definition of a comment means that it parses as four distinct valid tokens, div mul mul div.Andrei Alexandrescu Wrote:Yah, with regard to such regular patterns (strings, comments, numbers, identifiers) there are at least two possibilities that I see: 1. Go the full route of allowing regexen in the definition. This is very hard because you need to generate an efficient (N|D)FA during compilation. 2. Pragmatically allow "fallthrough" routines, i.e. if nothing in the compile-time table matches, just call onUnrecognizedString(). In conjunction with a few simple specialized functions, that makes it very simple to define arbitrarily complex lexers where the bulk of the work (and the most tedious part) is done by the D compiler.I have in mind the entire implementation of a simple design, but never had the time to execute on it. The tokenizer would work like this: alias Lexer!( "+", "PLUS", "-", "MINUS", "+=", "PLUS_EQ", ... "if", "IF", "else", "ELSE" ... ) DLexer; Such a declaration generates numeric values DLexer.PLUS etc. and generates an efficient code that extracts a stream of tokens from a stream of text. Each token in the token stream has the ID and the text.What about, say, floating-point literals? It seems like the first element of a pair might have to be a regex pattern.
Oct 23 2010
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message news:i9v8vq$2gvh$1 digitalmars.com...On 10/23/10 11:44 CDT, Sean Kelly wrote:What's wrong with regexes? That's pretty typical for lexers.Andrei Alexandrescu<SeeWebsiteForEmail erdani.org> wrote:I was thinking comments could be easily caught by simple routines: alias Lexer!( "+", "PLUS", "-", "MINUS", "+=", "PLUS_EQ", ... "/*", q{parseNonNestedComment("*/")}, "/+", q{parseNestedComment("+/")}, "//", q{parseOneLineComment()}, ... "if", "IF", "else", "ELSE", ... ) DLexer; During compilation, such non-tokens are recognized as code by the lexer generator and called appropriately. A comprehensive library of such routines completes a useful library.On 10/22/10 16:28 CDT, Sean Kelly wrote:For the second, that may push the work of recognizing some lexical elements into the parser. For example, a comment may be defined as /**/, which if there is no lexical definition of a comment means that it parses as four distinct valid tokens, div mul mul div.Andrei Alexandrescu Wrote:Yah, with regard to such regular patterns (strings, comments, numbers, identifiers) there are at least two possibilities that I see: 1. Go the full route of allowing regexen in the definition. This is very hard because you need to generate an efficient (N|D)FA during compilation. 2. Pragmatically allow "fallthrough" routines, i.e. if nothing in the compile-time table matches, just call onUnrecognizedString(). In conjunction with a few simple specialized functions, that makes it very simple to define arbitrarily complex lexers where the bulk of the work (and the most tedious part) is done by the D compiler.I have in mind the entire implementation of a simple design, but never had the time to execute on it. The tokenizer would work like this: alias Lexer!( "+", "PLUS", "-", "MINUS", "+=", "PLUS_EQ", ... "if", "IF", "else", "ELSE" ... ) DLexer; Such a declaration generates numeric values DLexer.PLUS etc. and generates an efficient code that extracts a stream of tokens from a stream of text. Each token in the token stream has the ID and the text.What about, say, floating-point literals? It seems like the first element of a pair might have to be a regex pattern.
Oct 23 2010
On 10/23/10 16:39 CDT, Nick Sabalausky wrote:"Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org> wrote in message news:i9v8vq$2gvh$1 digitalmars.com... What's wrong with regexes? That's pretty typical for lexers.I mentioned that using regexes is possible but would make it much more difficult to generate good quality lexers. Besides, regexen are IMHO quite awkward at expressing certain things that can be easily parsed by hand, such as comments or recursive comments. Andrei
Oct 23 2010
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message news:i9vlep$8ao$1 digitalmars.com...On 10/23/10 16:39 CDT, Nick Sabalausky wrote:I see. Maybe a lexer 2.0 thing."Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org> wrote in message news:i9v8vq$2gvh$1 digitalmars.com... What's wrong with regexes? That's pretty typical for lexers.I mentioned that using regexes is possible but would make it much more difficult to generate good quality lexers.Besides, regexen are IMHO quite awkward at expressing certain things that can be easily parsed by hand, such as comments//[^\n]*\n /\*(.|\*[^/])*\*/ Pretty simple as far as regexes go, and I'm far from a regex expert. Plus there's nothing stopping the use of a vastly improved regex syntax like GOLD uses ( http://www.devincook.com/goldparser/doc/grammars/define-terminals.htm ). In that, the two regexes above would look like: {LineCommentChar} = {Printable} - {LF} LineComment = '//' {LineCommentChar}* {LF} {BlockCommentChar} = {Printable} - [*] {BlockCommentCharNoSlash} = {BlockCommentChar} - [/] BlockComment = '/*' ({BlockCommentChar} | '*' {BlockCommentCharNoSlash})* '*/' And further syntactical improvement is easy to imagine, such as in-line character set creation.or recursive comments.Granted, although I think there is precident for regex engines that can handle matched nested pairs just fine.
Oct 23 2010
Nick Sabalausky wrote:What's wrong with regexes?They don't handle recursion.
Oct 23 2010
"Walter Bright" <newshound2 digitalmars.com> wrote in message news:i9vn3l$bd1$2 digitalmars.com...Nick Sabalausky wrote:Neither do plain-old strings. But regexes will get you farther than plain strings before needing to resort to customized lexing. But I'm a big data-driven fan anyway. If you're not than I can see why it wouldn't seem as appealing as it does to me. In any case, if I have a chance I might see about adapting my Goldie ( www.dsource.org/projects/goldie ) library to more Phobos-friendly requirements. It's already a fully-usable lexer/parser (and the lexer/parser parts can be used independantly), with a complete grammar description language and I already have misc related tools written. And it's mostly working on D2 already (just need the next DMD because it has a fix for a bug that's a breaker for one of the tools). So if I can get it into a state more suitable for Phobos then that might end up putting things ahead of where they would be if someone just started from scratch. The initial versions might not be completely Phobos-ified, but it could definitely get there (especially if I had some guidance from people with more Phobos2 experience than me). Would Walter & co be interested in this? If not, I won't bother, but if so, then I may give it a shot.What's wrong with regexes?They don't handle recursion.
Oct 23 2010
"Nick Sabalausky" <a a.a> wrote in message news:ia01q3$1i1a$1 digitalmars.com..."Walter Bright" <newshound2 digitalmars.com> wrote in message news:i9vn3l$bd1$2 digitalmars.com...And FWIW, I was already thnking about making some improvements to Goldie's API enyway.Nick Sabalausky wrote:Neither do plain-old strings. But regexes will get you farther than plain strings before needing to resort to customized lexing. But I'm a big data-driven fan anyway. If you're not than I can see why it wouldn't seem as appealing as it does to me. In any case, if I have a chance I might see about adapting my Goldie ( www.dsource.org/projects/goldie ) library to more Phobos-friendly requirements. It's already a fully-usable lexer/parser (and the lexer/parser parts can be used independantly), with a complete grammar description language and I already have misc related tools written. And it's mostly working on D2 already (just need the next DMD because it has a fix for a bug that's a breaker for one of the tools). So if I can get it into a state more suitable for Phobos then that might end up putting things ahead of where they would be if someone just started from scratch. The initial versions might not be completely Phobos-ified, but it could definitely get there (especially if I had some guidance from people with more Phobos2 experience than me). Would Walter & co be interested in this? If not, I won't bother, but if so, then I may give it a shot.What's wrong with regexes?They don't handle recursion.
Oct 23 2010
"Nick Sabalausky" <a a.a> wrote in message news:ia01sk$1i7s$1 digitalmars.com..."Nick Sabalausky" <a a.a> wrote in message news:ia01q3$1i1a$1 digitalmars.com...But that's all if you want generalized lexing or parsing though. If you just want "lexing D code"/"parsing D code", then IMO anything other than adapting parts of DDMD would be the wrong way to go."Walter Bright" <newshound2 digitalmars.com> wrote in message news:i9vn3l$bd1$2 digitalmars.com...And FWIW, I was already thnking about making some improvements to Goldie's API enyway.Nick Sabalausky wrote:Neither do plain-old strings. But regexes will get you farther than plain strings before needing to resort to customized lexing. But I'm a big data-driven fan anyway. If you're not than I can see why it wouldn't seem as appealing as it does to me. In any case, if I have a chance I might see about adapting my Goldie ( www.dsource.org/projects/goldie ) library to more Phobos-friendly requirements. It's already a fully-usable lexer/parser (and the lexer/parser parts can be used independantly), with a complete grammar description language and I already have misc related tools written. And it's mostly working on D2 already (just need the next DMD because it has a fix for a bug that's a breaker for one of the tools). So if I can get it into a state more suitable for Phobos then that might end up putting things ahead of where they would be if someone just started from scratch. The initial versions might not be completely Phobos-ified, but it could definitely get there (especially if I had some guidance from people with more Phobos2 experience than me). Would Walter & co be interested in this? If not, I won't bother, but if so, then I may give it a shot.What's wrong with regexes?They don't handle recursion.
Oct 23 2010
Nick Sabalausky:But that's all if you want generalized lexing or parsing though. If you just want "lexing D code"/"parsing D code", then IMO anything other than adapting parts of DDMD would be the wrong way to go.Is the DDMD licence compatible with the Phobos one? Is the DDMD author(s) willing? Bye, bearophile
Oct 23 2010
"bearophile" <bearophileHUGS lycos.com> wrote in message news:ia0410$1lju$1 digitalmars.com...Nick Sabalausky:I'd certainly hope so. If it isn't, then that would probably mean DMD's FE license is incompatible with Phobos. Which would be rather...weird. In any case, I asked that and a couple other Q's here, but haven't gotten an answer yet: http://www.dsource.org/forums/viewtopic.php?t=5627But that's all if you want generalized lexing or parsing though. If you just want "lexing D code"/"parsing D code", then IMO anything other than adapting parts of DDMD would be the wrong way to go.Is the DDMD licence compatible with the Phobos one? Is the DDMD author(s) willing?
Oct 23 2010
On Sun, 24 Oct 2010 06:55:22 +0400, Nick Sabalausky <a a.a> wrote:"bearophile" <bearophileHUGS lycos.com> wrote in message news:ia0410$1lju$1 digitalmars.com...Sorry, I wasn't checking the forum. IIRC DMD license is GPL so DDMD must be GPL too but I'm all for relicensing it as Boost.Nick Sabalausky:I'd certainly hope so. If it isn't, then that would probably mean DMD's FE license is incompatible with Phobos. Which would be rather...weird. In any case, I asked that and a couple other Q's here, but haven't gotten an answer yet: http://www.dsource.org/forums/viewtopic.php?t=5627But that's all if you want generalized lexing or parsing though. If you just want "lexing D code"/"parsing D code", then IMO anything other than adapting parts of DDMD would be the wrong way to go.Is the DDMD licence compatible with the Phobos one? Is the DDMD author(s) willing?
Oct 24 2010
"Denis Koroskin" <2korden gmail.com> wrote in message news:op.vk2na9bpo7cclz korden-pc...On Sun, 24 Oct 2010 06:55:22 +0400, Nick Sabalausky <a a.a> wrote:According to a random file I picked out of trunk, it's dual-licensed with GPL (not sure which version) and Artistic (also not sure which version) http://www.dsource.org/projects/dmd/browser/trunk/src/access.c"bearophile" <bearophileHUGS lycos.com> wrote in message news:ia0410$1lju$1 digitalmars.com...Sorry, I wasn't checking the forum. IIRC DMD license is GPL so DDMD must be GPL too but I'm all for relicensing it as Boost.Nick Sabalausky:I'd certainly hope so. If it isn't, then that would probably mean DMD's FE license is incompatible with Phobos. Which would be rather...weird. In any case, I asked that and a couple other Q's here, but haven't gotten an answer yet: http://www.dsource.org/forums/viewtopic.php?t=5627But that's all if you want generalized lexing or parsing though. If you just want "lexing D code"/"parsing D code", then IMO anything other than adapting parts of DDMD would be the wrong way to go.Is the DDMD licence compatible with the Phobos one? Is the DDMD author(s) willing?
Oct 24 2010
"Nick Sabalausky" <a a.a> wrote in message news:ia0v9p$11p$1 digitalmars.com..."Denis Koroskin" <2korden gmail.com> wrote in message news:op.vk2na9bpo7cclz korden-pc...That does surprise me though, since I'm pretty sure Phobos is Boost License. Anyone know why the difference?On Sun, 24 Oct 2010 06:55:22 +0400, Nick Sabalausky <a a.a> wrote:According to a random file I picked out of trunk, it's dual-licensed with GPL (not sure which version) and Artistic (also not sure which version) http://www.dsource.org/projects/dmd/browser/trunk/src/access.c"bearophile" <bearophileHUGS lycos.com> wrote in message news:ia0410$1lju$1 digitalmars.com...Sorry, I wasn't checking the forum. IIRC DMD license is GPL so DDMD must be GPL too but I'm all for relicensing it as Boost.Nick Sabalausky:I'd certainly hope so. If it isn't, then that would probably mean DMD's FE license is incompatible with Phobos. Which would be rather...weird. In any case, I asked that and a couple other Q's here, but haven't gotten an answer yet: http://www.dsource.org/forums/viewtopic.php?t=5627But that's all if you want generalized lexing or parsing though. If you just want "lexing D code"/"parsing D code", then IMO anything other than adapting parts of DDMD would be the wrong way to go.Is the DDMD licence compatible with the Phobos one? Is the DDMD author(s) willing?
Oct 24 2010
Nick Sabalausky wrote:That does surprise me though, since I'm pretty sure Phobos is Boost License. Anyone know why the difference?Phobos is Boost licensed to enable maximum usage for any purpose. The dmd front end is GPL licensed in order to ensure it stays open source and to discourage closed source forks.
Oct 24 2010
On 2010-10-24 04:55, Nick Sabalausky wrote:"bearophile"<bearophileHUGS lycos.com> wrote in message news:ia0410$1lju$1 digitalmars.com...As Walter wrote in the first post of this thread: "generally follow along with the C++ one so that they can be maintained in tandem" and in another post: "Since the license is mine, I can change the D version to the Boost license, no problem." http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D&artnum=120221 -- /Jacob CarlborgNick Sabalausky:I'd certainly hope so. If it isn't, then that would probably mean DMD's FE license is incompatible with Phobos. Which would be rather...weird. In any case, I asked that and a couple other Q's here, but haven't gotten an answer yet: http://www.dsource.org/forums/viewtopic.php?t=5627But that's all if you want generalized lexing or parsing though. If you just want "lexing D code"/"parsing D code", then IMO anything other than adapting parts of DDMD would be the wrong way to go.Is the DDMD licence compatible with the Phobos one? Is the DDMD author(s) willing?
Oct 24 2010
Nick Sabalausky wrote:Would Walter & co be interested in this? If not, I won't bother, but if so, then I may give it a shot.The problem is I never have used parser/lexer generators, so I am not really in a good position to review it.
Oct 23 2010
"Walter Bright" <newshound2 digitalmars.com> wrote in message news:ia0cfv$22kp$1 digitalmars.com...Nick Sabalausky wrote:Understandable. FWIW though, Goldie isn't really lexer/parse generator per se. Traditional lexer/parser generators like lex/yacc or ANTLR will actually generate the source code for a lexer or parser. Goldie just has a single lexer and parser, both already pre-written. They're just completely data-driven: Compared to the generators, Goldie's lexer is more like a general regex engine that simultaneously matches against multiple pre-compiled "regexes". By pre-compiled, I mean turned into a DFA - which is currently done by a separate non-source-available tool I didn't write, but I'm going to be writing my own version soon. By "regexes", I mean they're functionally regexes, but they're written in a much easier-to-read syntax than the typical PCRE. Goldie's parser is really just a rather typical (from what I understand) LALR parser. I don't know how much you know about LALR's, but the parser itself is naturally grammar-independent (at least as described in CS texts). Using an LALR involves converting the grammar completely into a table of states and lookaheads (single-token lookahead; unlike LL, any more than that is never really needed), and then the actual parser is directed entirely by that table (much like how regexes are converted to data, ie DFA, and then processed generically), so it's completely grammar-independent. And of, course, the actual lexer and parser can be optimized/rewritten/whatever with minimal impact on everything else. If anyone's interested, further details are here(1): http://www.devincook.com/goldparser/ Goldie does have optional code-generation capabilities, but it's entirely for the sake of providing a better statically-checked API tailored to your grammar (ex: to use D's type system to ensure at compile-time, instead of run-time, that token names are valid and that BNF rules you reference actually exist). It doesn't actually affect the lexer/parser in any non-trivial way. (1): By that site's terminology, Goldie would technically be a "GOLD Engine", plus some additional tools. But, my current work on Goldie will cut that actual "GOLD Parser Builder" program completely out-of-the-loop (but it will still maintain compatibility with it for anyone who wants to use it).Would Walter & co be interested in this? If not, I won't bother, but if so, then I may give it a shot.The problem is I never have used parser/lexer generators, so I am not really in a good position to review it.
Oct 23 2010
Nick Sabalausky wrote:If anyone's interested, further details are here(1): http://www.devincook.com/goldparser/It looks nice, but in clicking around on FAQ, documentation, getting started, etc., I can't find any example code.
Oct 24 2010
"Walter Bright" <newshound2 digitalmars.com> wrote in message news:ia0pce$2pbk$1 digitalmars.com...Nick Sabalausky wrote:Well, that's because that program (GOLD Parser Builder) is just a tool that takes in a grammar description file and spits out the lexer/parser DFA/LALR tables. Then you use any GOLD-compatible engine in any langauge (such as Goldie) to load the DFA/LALR tables and use them to lex/parse. (But again, I'm currently working on code that will do that without having to use GOLD Parser Builder.) Here's some specific links for Goldie, and keep in mind that 1. I already have it pretty much converted to D2/Phobos in trunk (it used to be D1/Tango), 2. The API is not final and definitely open to suggestions (I have a few ideas already), 3. Any suggestions for improvements to the documentation, are, of course, welcome too, 4. Like I've said, in the next official release, using "GOLD Parser Builder" won't actually be required. Main Goldie Project page: http://www.dsource.org/projects/goldie Documentation for latest official release: http://www.semitwist.com/goldiedocs/current/Docs/ Samples directory in trunk: http://www.dsource.org/projects/goldie/browser/trunk/src/samples Slightly old documentation for the samples: http://www.semitwist.com/goldiedocs/current/Docs/SampleApps/ There's two "calculator" samples. They're the same, but correspond to the two different styles Goldie supports. One, "dynamic", doesn't involve any source-code-generation step and can load and use any arbitrary grammar at runtime (neat usages of this are shown in the "ParseAnything" sample and in the "Parse" tool http://www.semitwist.com/goldiedocs/current/Docs/Tools/Parse/ ). The other, "static", does involve generating some source-code (via a comand-line tool), but that gives you an API that's statically-checked against the grammar. The differences and pros/cons between these two styles are explained here (let me know if it's unclear): http://www.semitwist.com/goldiedocs/current/Docs/APIOver/StatVsDyn/If anyone's interested, further details are here(1): http://www.devincook.com/goldparser/It looks nice, but in clicking around on FAQ, documentation, getting started, etc., I can't find any example code.
Oct 24 2010
It looks like a solid engine, and a nice tool. Does it belong as part of Phobos? I don't know. What do other D users think?
Oct 24 2010
Nick Sabalausky wrote:http://www.semitwist.com/goldiedocs/current/Docs/APIOver/StatVsDyn/One question I have is how does it compare with Spirit? That would be its main counterpart in the C++ space.
Oct 24 2010
On 24/10/2010 18:19, Walter Bright wrote:Nick Sabalausky wrote:Spirit is a LL parser, so it's not really suitable for human edited input as doing exact error reporting is tricky. -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.ukhttp://www.semitwist.com/goldiedocs/current/Docs/APIOver/StatVsDyn/One question I have is how does it compare with Spirit? That would be its main counterpart in the C++ space.
Oct 24 2010
"Walter Bright" <newshound2 digitalmars.com> wrote in message news:ia1ps7$1fq5$2 digitalmars.com...Nick Sabalausky wrote:Can't say I'm really familiar with Spirit. From a brief lookover, these are my impresions of the differences: Spirit: Grammar is embedded into your source code as actual C++ code. Goldie: Grammar is defined in a domain-specfic language. But either one could probably have a wrapper to work the other way. Spirit: Uses (abuses?) operator overloading (Although, apperently SpiritD doesn't inherit Spirit's operator overloading: http://www.sstk.co.uk/spiritd.php ) Goldie: Operator overloading isn't really applicable, because of using a DSL. As they stand, Spirit seems like it could be pretty handly for simple, quick little DSLs, ex, things for which Goldie might seem like overkill. But Goldie's interface could probably be improved to compete pretty well in those cases. OTOH, Goldie's approach (being based on GOLD) has a deliberate separation between grammar and parsing, which has it's own benefits; for instance, grammar definitions can be re-used for any purpose.http://www.semitwist.com/goldiedocs/current/Docs/APIOver/StatVsDyn/One question I have is how does it compare with Spirit? That would be its main counterpart in the C++ space.
Oct 24 2010
Nick Sabalausky wrote:Can't say I'm really familiar with Spirit. From a brief lookover, these are my impresions of the differences: Spirit: Grammar is embedded into your source code as actual C++ code. Goldie: Grammar is defined in a domain-specfic language. But either one could probably have a wrapper to work the other way. Spirit: Uses (abuses?) operator overloading (Although, apperently SpiritD doesn't inherit Spirit's operator overloading: http://www.sstk.co.uk/spiritd.php ) Goldie: Operator overloading isn't really applicable, because of using a DSL. As they stand, Spirit seems like it could be pretty handly for simple, quick little DSLs, ex, things for which Goldie might seem like overkill. But Goldie's interface could probably be improved to compete pretty well in those cases. OTOH, Goldie's approach (being based on GOLD) has a deliberate separation between grammar and parsing, which has it's own benefits; for instance, grammar definitions can be re-used for any purpose.Does Goldie have (like Spirit) a set of canned routines for things like numeric literals? Can the D version of Goldie be turned into one file?
Oct 24 2010
"Walter Bright" <newshound2 digitalmars.com> wrote in message news:ia2duj$2j7e$1 digitalmars.com...Nick Sabalausky wrote:No, but such things can easily be provided in the docs for simple copy-paste. For instance: DecimalLiteral = {Number} ({Number} | '_')* HexLiteral = '0' [xX] ({Number} | [ABCDEFabcdef_])+ Identifier = ('_' | {Letter}) ('_' | {AlphaNumeric})* {StringChar} = {Printable} - ["] StringLiteral = '"' ({StringChar} | '\' {Printable})* '"' All one would need to do to use those is copy-paste them into their grammar definition. Some sort of import mechanism could certainly be added though, to allow for selective import of pre-defined things like that. There are many pre-defined character sets though (and others can be manually-created, of course): http://www.devincook.com/goldparser/doc/grammars/character-sets.htmCan't say I'm really familiar with Spirit. From a brief lookover, these are my impresions of the differences: Spirit: Grammar is embedded into your source code as actual C++ code. Goldie: Grammar is defined in a domain-specfic language. But either one could probably have a wrapper to work the other way. Spirit: Uses (abuses?) operator overloading (Although, apperently SpiritD doesn't inherit Spirit's operator overloading: http://www.sstk.co.uk/spiritd.php ) Goldie: Operator overloading isn't really applicable, because of using a DSL. As they stand, Spirit seems like it could be pretty handly for simple, quick little DSLs, ex, things for which Goldie might seem like overkill. But Goldie's interface could probably be improved to compete pretty well in those cases. OTOH, Goldie's approach (being based on GOLD) has a deliberate separation between grammar and parsing, which has it's own benefits; for instance, grammar definitions can be re-used for any purpose.Does Goldie have (like Spirit) a set of canned routines for things like numeric literals?Can the D version of Goldie be turned into one file?Assuming just the library and not the included tools (many of which could be provided as part of the library, though), and not counting files generated for the static-style, then yes, but it would probably be a bit long.
Oct 24 2010
Nick Sabalausky wrote:"Walter Bright" <newshound2 digitalmars.com> wrote in messageIn the regexp code, I provided special regexes for email addresses and URLs. Those are hard to get right, so it's a large convenience to provide them. Also, many literals can be fairly complex, and evaluating them can produce errors (such as integer overflow in the numeric literals). Having canned ones makes it much quicker for a user to get going. I'm guessing that a numeric literal is returned as a string. Is this string allocated on the heap? If so, it's a performance problem. Storage allocation costs figure large when trying to lex millions of lines.Does Goldie have (like Spirit) a set of canned routines for things like numeric literals?No, but such things can easily be provided in the docs for simple copy-paste. For instance: DecimalLiteral = {Number} ({Number} | '_')* HexLiteral = '0' [xX] ({Number} | [ABCDEFabcdef_])+ Identifier = ('_' | {Letter}) ('_' | {AlphaNumeric})* {StringChar} = {Printable} - ["] StringLiteral = '"' ({StringChar} | '\' {Printable})* '"' All one would need to do to use those is copy-paste them into their grammar definition. Some sort of import mechanism could certainly be added though, to allow for selective import of pre-defined things like that.There are many pre-defined character sets though (and others can be manually-created, of course): http://www.devincook.com/goldparser/doc/grammars/character-sets.htmLong files aren't a problem. That's why we have .di files! I worry more about clutter.Can the D version of Goldie be turned into one file?Assuming just the library and not the included tools (many of which could be provided as part of the library, though), and not counting files generated for the static-style, then yes, but it would probably be a bit long.
Oct 24 2010
"Walter Bright" <newshound2 digitalmars.com> wrote in message news:ia34up$ldb$1 digitalmars.com...In the regexp code, I provided special regexes for email addresses and URLs. Those are hard to get right, so it's a large convenience to provide them. Also, many literals can be fairly complex, and evaluating them can produce errors (such as integer overflow in the numeric literals). Having canned ones makes it much quicker for a user to get going.I'm not sure what exectly you're suggesting in these two paragraphs? (Or just commenting?)I'm guessing that a numeric literal is returned as a string. Is this string allocated on the heap? If so, it's a performance problem. Storage allocation costs figure large when trying to lex millions of lines.Good point. I've just checked and there is allocation going on for each terminal lexed. But thanks to D's awesomeness, I can easily fix that to just use a slice of the original source string. I'll do that...Long files aren't a problem. That's why we have .di files! I worry more about clutter.I really find long files to be a pain to read and edit. It would be nice if Then, modules with a lot of code could be broken down as appropriate for their maintainers without having to bother the users with the "module blah.all" workaround (which Goldie currently uses, but I realize isn't normal Phobos style). AIUI, .di files don't really solve that. There is one other other minor related issue, though. One of my big principles for Goldie is flexibility. So in addition to the basic API that most people would use, I like to expose lower-level APIs for people who might want to sidestep certain parts of Goldie, or provide other less-typical but potentially useful things. But such things shouldn't be automatically imported for typical users, so that sort of stuff would be best left to a separate-but-related module. Maybe it's just too late over here for me, but can you be more specific on "clutter"? Do you mean like API clutter?
Oct 24 2010
Nick Sabalausky wrote:"Walter Bright" <newshound2 digitalmars.com> wrote in message news:ia34up$ldb$1 digitalmars.com...Does Goldie's lexer not convert numeric literals to integer values?In the regexp code, I provided special regexes for email addresses and URLs. Those are hard to get right, so it's a large convenience to provide them. Also, many literals can be fairly complex, and evaluating them can produce errors (such as integer overflow in the numeric literals). Having canned ones makes it much quicker for a user to get going.I'm not sure what exectly you're suggesting in these two paragraphs? (Or just commenting?)Are all tokens returned as strings?I'm guessing that a numeric literal is returned as a string. Is this string allocated on the heap? If so, it's a performance problem. Storage allocation costs figure large when trying to lex millions of lines.Good point. I've just checked and there is allocation going on for each terminal lexed. But thanks to D's awesomeness, I can easily fix that to just use a slice of the original source string. I'll do that...If I may suggest, leave the low level stuff out of the api until demand for it justifies it. It's hard to predict just what will be useful, so I suggest conservatism rather than kitchen sink. It can always be added later, but it's really hard to remove.Long files aren't a problem. That's why we have .di files! I worry more about clutter.I really find long files to be a pain to read and edit. It would be nice if Then, modules with a lot of code could be broken down as appropriate for their maintainers without having to bother the users with the "module blah.all" workaround (which Goldie currently uses, but I realize isn't normal Phobos style). AIUI, .di files don't really solve that. There is one other other minor related issue, though. One of my big principles for Goldie is flexibility. So in addition to the basic API that most people would use, I like to expose lower-level APIs for people who might want to sidestep certain parts of Goldie, or provide other less-typical but potentially useful things. But such things shouldn't be automatically imported for typical users, so that sort of stuff would be best left to a separate-but-related module.Maybe it's just too late over here for me, but can you be more specific on "clutter"? Do you mean like API clutter?That too, but I meant a clutter of files. Long files aren't a problem with D.
Oct 25 2010
"Walter Bright" <newshound2 digitalmars.com> wrote in message news:ia3c3r$14k8$1 digitalmars.com...Does Goldie's lexer not convert numeric literals to integer values? Are all tokens returned as strings?Goldie's lexer (and parser) are based on the GOLD system ( http://www.devincook.com/goldparser/ ) which is deliberately independent of both grammar and implementation language. As such, it doesn't know anything about what the specific terminals actually represent (There are 4 exceptions though: Comment tokens, Whitespace tokens, an "Error" token (ie, for lex errors), and the EOF token.) So the lexed data is always represented as a string. Although, the lexer actually returns an array of "class Token" ( http://www.semitwist.com/goldiedocs/current/Docs/APIRef/Token/#Token ). To get the original data that got lexed or parsed into that token, you call "toString()". (BTW, there are currently different "modes" of "toString()" for non-terminals, but I'm considering just ripping them all out and replacing them with a single "return a slice from the start of the first terminal to the end of the last terminal" - unless you think it would be useful to get a representation of the non-terminal's original data sans comments/whitespace, or with comments/whitespace converted to a single space.) I'm not sure that calling "to!whatever(token.toString())" is really all that much of a problem for user code.If I may suggest, leave the low level stuff out of the api until demand for it justifies it. It's hard to predict just what will be useful, so I suggest conservatism rather than kitchen sink. It can always be added later, but it's really hard to remove.That may be a good idea.That too, but I meant a clutter of files. Long files aren't a problem with D.Well, again, it may not be a problem with DMD, but I really think reading/editing a long file is a pain regardless of language. Maybe we just have different ideas of "long file"? To put it into numbers: At the moment, Goldie's library (not counting tools and the optional generated "static-mode" files) is about 3200 lines, including comment/blank lines. That size would be pretty unwieldy to maintain as a single source file, particularly since Goldie has a natural internal organization. Personally, I'd much rather have a clutter of source files than a cluttered source file. (But of course, I don't go to Java extremes and put *every* tiny little thing in a separate file.) As long as the complexity of having multiple files isn't passed along to user code (hence the frequent "module foo.all" idiom), then I can't say I really see a problem.
Oct 25 2010
Nick Sabalausky wrote:"Walter Bright" <newshound2 digitalmars.com> wrote in message news:ia3c3r$14k8$1 digitalmars.com...Consider a string literal, say "abc\"def". With Goldie's method, I infer this string has to be scanned twice. Once to find its limits, and the second to convert it to the actual string. The latter is user code and will have to replicate whatever Goldie did.Does Goldie's lexer not convert numeric literals to integer values? Are all tokens returned as strings?Goldie's lexer (and parser) are based on the GOLD system ( http://www.devincook.com/goldparser/ ) which is deliberately independent of both grammar and implementation language. As such, it doesn't know anything about what the specific terminals actually represent (There are 4 exceptions though: Comment tokens, Whitespace tokens, an "Error" token (ie, for lex errors), and the EOF token.) So the lexed data is always represented as a string. Although, the lexer actually returns an array of "class Token" ( http://www.semitwist.com/goldiedocs/current/Docs/APIRef/Token/#Token ). To get the original data that got lexed or parsed into that token, you call "toString()". (BTW, there are currently different "modes" of "toString()" for non-terminals, but I'm considering just ripping them all out and replacing them with a single "return a slice from the start of the first terminal to the end of the last terminal" - unless you think it would be useful to get a representation of the non-terminal's original data sans comments/whitespace, or with comments/whitespace converted to a single space.) I'm not sure that calling "to!whatever(token.toString())" is really all that much of a problem for user code.What Goldie will be compared against is Spirit. Spirit is a reasonably successful add-on to C++. Goldie doesn't have to do things the same way as Spirit (expression templates - ugh), but it should be as easy to use and at least as powerful.If I may suggest, leave the low level stuff out of the api until demand for it justifies it. It's hard to predict just what will be useful, so I suggest conservatism rather than kitchen sink. It can always be added later, but it's really hard to remove.That may be a good idea.Actually, I think 3200 lines is of moderate, not large, size :-)That too, but I meant a clutter of files. Long files aren't a problem with D.Well, again, it may not be a problem with DMD, but I really think reading/editing a long file is a pain regardless of language. Maybe we just have different ideas of "long file"? To put it into numbers: At the moment, Goldie's library (not counting tools and the optional generated "static-mode" files) is about 3200 lines, including comment/blank lines. That size would be pretty unwieldy to maintain as a single source file, particularly since Goldie has a natural internal organization.Personally, I'd much rather have a clutter of source files than a cluttered source file. (But of course, I don't go to Java extremes and put *every* tiny little thing in a separate file.) As long as the complexity of having multiple files isn't passed along to user code (hence the frequent "module foo.all" idiom), then I can't say I really see a problem.I tend to just not like having to constantly grep to see which file XXX is in.
Oct 25 2010
"Walter Bright" <newshound2 digitalmars.com> wrote in message news:ia59si$1r0j$1 digitalmars.com...Consider a string literal, say "abc\"def". With Goldie's method, I infer this string has to be scanned twice. Once to find its limits, and the second to convert it to the actual string.Yea, that is true. With that string in the input, the value given to the user code will be: assert(tokenObtainedFromGoldie.toString() == q{"abc\"def"}); That's a consequence of the grammar being separated from lexing/parsing implementation. You're right that that does seem less than ideal. Although I'm not sure how to remedy that without loosing the independence between grammar and lex/parse implementation that is the main point of the GOLD-based style. But there's something I don't quite understand about the approach you're suggesting: You seem to be suggesting that a terminal be progressively converted into its final form *as* it's still in the process of being recognized by the DFA. Which means, you don't know *what* you're supposed to be converting it into *while* you're converting it. Which means, you have to be speculatively converting it into all types of tokens that the current DFA state could possibly be on its way towards accepting (also, the DFA would need to contain a record of possible terminals for each DFA state). And then the result is thrown away if it turns out to be a different terminal. Is this correct? If so, is there generally enough lexical difference between the terminals that need such treatment to compensate for the extra processing needed in situations that are closer to worst-case (that is, in comparison to Goldie's current approach)? If all of that is so, then what would be your thoughts on this approach?: Suppose Goldie had a way to associate an optional "simultaneous/lockstep conversion" to a type of terminal. For instance: myLanguage.associateConversion("StringLiteral", new StringLiteralConverter()); Then, 'StringLiteralConverter' would be something that could be either user-provided or offered by Goldie (both ways would be supported). It would be some sort of class or something that had three basic functions: class StringLiteralConverter : ITerminalConverter { void process(dchar c) {...} // Or maybe this to make it possible to minimize allocations // in certain circumstances by utilizing slices: void process(dchar c, size_t indexIntoSource, string fullOrignalSource) {...} Variant emit() {...} void clear() {...} } Each state in the lexer's DFA would know which terminals it could possibly be processing. And for each of those terminals that has an associated converter, the lexer will call 'process()'. If a terminal is accepted, 'emit' is called to get the final result (and maybe do any needed finalization first), and then 'clear' is called on all converters that had been used. This feature would preclude the use of the actual "GOLD Parser Builder" program, but since I'm writing a tool to handle that functionality anyway, I'm not too concerned about that. Do you think that would work? Would its benefits be killed by the overhead introduced? If so, could those overheads be sufficiently reduced without scrapping the general idea?What Goldie will be compared against is Spirit. Spirit is a reasonably successful add-on to C++. Goldie doesn't have to do things the same way as Spirit (expression templates - ugh), but it should be as easy to use and at least as powerful.Understood.Diff'rent strokes, I guess. I've only ever had that problem with Tango, which seems to kinda follow from the Java-STD-lib school of API design (no offense intended, Tango guys). But if I'm working on something that involves different sections of a codebase, which is very frequent, then I find it to be quite a pain to constantly scroll all around instead of just Ctrl-Tabbing between open files in different tabs.Personally, I'd much rather have a clutter of source files than a cluttered source file. (But of course, I don't go to Java extremes and put *every* tiny little thing in a separate file.) As long as the complexity of having multiple files isn't passed along to user code (hence the frequent "module foo.all" idiom), then I can't say I really see a problem.I tend to just not like having to constantly grep to see which file XXX is in.
Oct 25 2010
Nick Sabalausky wrote:"Walter Bright" <newshound2 digitalmars.com> wrote in message news:ia59si$1r0j$1 digitalmars.com...Probably that's why I don't use lexer generators. Building lexers is the simplest part of building a compiler, and I've always been motivated by trying to make it as fast as possible. To specifically answer your question, yes, in the lexers I make, you know you're parsing a string, so you process it as you parse it.Consider a string literal, say "abc\"def". With Goldie's method, I infer this string has to be scanned twice. Once to find its limits, and the second to convert it to the actual string.Yea, that is true. With that string in the input, the value given to the user code will be: assert(tokenObtainedFromGoldie.toString() == q{"abc\"def"}); That's a consequence of the grammar being separated from lexing/parsing implementation. You're right that that does seem less than ideal. Although I'm not sure how to remedy that without loosing the independence between grammar and lex/parse implementation that is the main point of the GOLD-based style. But there's something I don't quite understand about the approach you're suggesting: You seem to be suggesting that a terminal be progressively converted into its final form *as* it's still in the process of being recognized by the DFA. Which means, you don't know *what* you're supposed to be converting it into *while* you're converting it. Which means, you have to be speculatively converting it into all types of tokens that the current DFA state could possibly be on its way towards accepting (also, the DFA would need to contain a record of possible terminals for each DFA state). And then the result is thrown away if it turns out to be a different terminal. Is this correct? If so, is there generally enough lexical difference between the terminals that need such treatment to compensate for the extra processing needed in situations that are closer to worst-case (that is, in comparison to Goldie's current approach)?If all of that is so, then what would be your thoughts on this approach?: Suppose Goldie had a way to associate an optional "simultaneous/lockstep conversion" to a type of terminal. For instance: myLanguage.associateConversion("StringLiteral", new StringLiteralConverter()); Then, 'StringLiteralConverter' would be something that could be either user-provided or offered by Goldie (both ways would be supported). It would be some sort of class or something that had three basic functions: class StringLiteralConverter : ITerminalConverter { void process(dchar c) {...} // Or maybe this to make it possible to minimize allocations // in certain circumstances by utilizing slices: void process(dchar c, size_t indexIntoSource, string fullOrignalSource) {...} Variant emit() {...} void clear() {...} } Each state in the lexer's DFA would know which terminals it could possibly be processing. And for each of those terminals that has an associated converter, the lexer will call 'process()'. If a terminal is accepted, 'emit' is called to get the final result (and maybe do any needed finalization first), and then 'clear' is called on all converters that had been used. This feature would preclude the use of the actual "GOLD Parser Builder" program, but since I'm writing a tool to handle that functionality anyway, I'm not too concerned about that. Do you think that would work? Would its benefits be killed by the overhead introduced? If so, could those overheads be sufficiently reduced without scrapping the general idea?I don't know. I'd have to study the issue for a while. I suggest taking a look at dmd's lexer and compare. I'm not sure what Spirit's approach to this is.What Goldie will be compared against is Spirit. Spirit is a reasonably successful add-on to C++. Goldie doesn't have to do things the same way as Spirit (expression templates - ugh), but it should be as easy to use and at least as powerful.Understood.
Oct 25 2010
"Walter Bright" <newshound2 digitalmars.com> wrote in message news:ia5j41$2bnk$1 digitalmars.com...To specifically answer your question, yes, in the lexers I make, you know you're parsing a string, so you process it as you parse it. ... I don't know. I'd have to study the issue for a while. I suggest taking a look at dmd's lexer and compare. I'm not sure what Spirit's approach to this is.I've taken a deeper look at Spirit's docs: In the older Spirit 1.x, the lexing is handled as part of the parsing. The structure of it definitely suggests it should be easy for it to do all token-conversion right as the string is being lexed, although I couldn't tell whether or not it actually did so (I'd have to look at the source). But, since Spirit 1.x doesn't handle lexing separately from parsing, I *think* backtracking (it *is* a backtracking parser) results in re-lexing, even for terminals that never get special processing, such as keywords (But I'm not completely certain because I don't have much experience with LL). In Spirit 2.x, standard usage involves having the lexing separate from parsing. I didn't see anything at all in the docs for Spirit 2.x that seemed to suggest even the possibility of it processing tokens as they're lexed. However, Spirit is designed with heavy policy-based customizability in mind, so such a thing might still possible in Spirit 2.x...But if so, it's definitely an advanced feature (or just really poorly documented). I have thought of another way to get such an ability into Goldie, and it would be very easy-to-use, but it would also be a fairly non-trivial to implement. And really, I'm starting to question again how important it would *really* be, at least initially. When I think of typical code, usually only a small amount of it is made up of the the sorts of terminals that would need extra processing. I have to admit, I still have no idea whether or not it would be worth it to get Goldie into Phobos. Maybe, maybe not, I dunno. I think popular opinion would probably be the best gauge of that. It seems like we're the only ones still in this thread, though...maybe that's a bad sign? ;) I do still think that if your primary goal is to provide parsing of D code through Phobos, then adapting DDMD would be the best best. Goldie would be more appropriate if customized lexing/parsing is the goal.
Oct 26 2010
Nick Sabalausky:I've taken a deeper look at Spirit's docs:I have not used Spirit, but from what I have read, it doesn't scale (the compilation becomes too much slower when the system you have built becomes bigger). Bye, bearophile
Oct 26 2010
"bearophile" <bearophileHUGS lycos.com> wrote in message news:ia6a0h$nst$1 digitalmars.com...Nick Sabalausky:I think that's just because it's C++ though. I'd bet a D lib that worked the same way would probably do a lot better. In any case, I started writing a comparison of the main fundamental differences between Spirit and Goldie, and it ended up kinda rambling and not so just-the-main-fundamentals. But the gist was: Spirit is very flexible in how grammars are defined and processed, and Goldie is very flexible in what you can do with a given grammar once it's written (ie, how much mileage you can get out of it without changing one line of grammar and without designing it from the start to be flexible). Goldie does get some of that "flexibility in what you can do with it" though by tossing in some features and some limitations/requirements that Spirit leaves as "if you want it, put it in yourself (more or less manually), otherwise you don't pay a price for it." I think both approaches have their merits. Although I haven't a clue which is best for Phobos, or if Phobos even needs either.I've taken a deeper look at Spirit's docs:I have not used Spirit, but from what I have read, it doesn't scale (the compilation becomes too much slower when the system you have built becomes bigger).
Oct 26 2010
bearophile, el 26 de octubre a las 06:20 me escribiste:Nick Sabalausky:I can confirm that, at least for Spirit 1, and for simple things it looks "nice" (in the C++ scale), but for real more complex things, the resulting code is really a mess. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- A can of diet coke will float in water While a can of regular coke will sinkI've taken a deeper look at Spirit's docs:I have not used Spirit, but from what I have read, it doesn't scale (the compilation becomes too much slower when the system you have built becomes bigger).
Oct 26 2010
Am 26.10.2010 15:55, schrieb Leandro Lucarella:bearophile, el 26 de octubre a las 06:20 me escribiste:yupp - Spirit feels right on the integration-side, but becomes more and more evil when stuff gets bigger a compiletime-ebnf-script parser would do better, especially when the ebnf-script comes through compiletime-file-include and can be used/developed from outside in an ide like gold parsers a compiletime-parse could "generated" the stub code like Spirit do but without beeing to much inside the language itselfeNick Sabalausky: > I've taken a deeper look at Spirit's docs: I have not used Spirit, but from what I have read, it doesn't scale (the compilation becomes too much slower when the system you have built becomes bigger).I can confirm that, at least for Spirit 1, and for simple things it looks "nice" (in the C++ scale), but for real more complex things, the resulting code is really a mess.
Oct 26 2010
Am 26.10.2010 16:48, schrieb dennis luehring:Am 26.10.2010 15:55, schrieb Leandro Lucarella:that combined with compiletime-features something like the bsn-parse do http://code.google.com/p/bsn-goldparser/ i think this all is very very doable in Dbearophile, el 26 de octubre a las 06:20 me escribiste:yupp - Spirit feels right on the integration-side, but becomes more and more evil when stuff gets bigger a compiletime-ebnf-script parser would do better, especially when the ebnf-script comes through compiletime-file-include and can be used/developed from outside in an ide like gold parsers a compiletime-parse could "generated" the stub code like Spirit do but without beeing to much inside the language itselfeNick Sabalausky: > I've taken a deeper look at Spirit's docs: I have not used Spirit, but from what I have read, it doesn't scale (the compilation becomes too much slower when the system you have built becomes bigger).I can confirm that, at least for Spirit 1, and for simple things it looks "nice" (in the C++ scale), but for real more complex things, the resulting code is really a mess.
Oct 26 2010
"dennis luehring" <dl.soluz gmx.net> wrote in message news:ia6s3b$1q90$1 digitalmars.com...Am 26.10.2010 16:48, schrieb dennis luehring:Goldie (and any GOLD-based system, really) should scale up pretty well. The only possible scaling-up issues would be: 1. Splitting a large grammar across multiple files is not yet supported (and if I do add support for that in Goldie, and I may, then the "GOLD Parser Builder" IDE wouldn't know how to handle it). classic-style ASP, or anything that involves a preprocessing step that hasn't already been done) aren't really supported yet. Spirit 1.x should be able to handle that, at least in some cases. I think Spirit 2.x's separation of lexing and parsing may have some trouble with it though. 3. I haven't had a chance to add any sort of character set optimization yet, so grammars that allow a large amount of Unicode characters will probably be slow to generate into tables and slow to lex. At least until I get around to taking care of that. I've never actually used Spirit, but its scaling up issues do seem to be a fairly fundamental issue with it's design (particularly so since it's C++). Although they do say on their site that some C++ compilers can handle Spirit without compile time growing exponentially in relation to grammar complexity.Am 26.10.2010 15:55, schrieb Leandro Lucarella: yupp - Spirit feels right on the integration-side, but becomes more and more evil when stuff gets biggerThere's one problem with doing things via CTFE that us D folks often overlook: You can't use a build tool like make/rake/scons to detect when that particular data doesn't need to be recomputed and can thus be skipped. (Although it may be possible to manually make it work that way *if* CTFE gains the ability to access the filesystem.) I'm not opposed to the idea of making Goldie's compiling-a-grammar (ie, "process a grammar into the appropriate tables") ctfe-able, but it does already work in a way that you only need to compile a grammar into tables when you change the grammar (and changing a grammar is needed less frequently in Goldie than in Spirit because in Goldie no processing code is ever embedded into the grammar.).a compiletime-ebnf-script parser would do better, especially when the ebnf-script comes through compiletime-file-include and can be used/developed from outside in an ide like gold parsersthat combined with compiletime-features something like the bsn-parse do http://code.google.com/p/bsn-goldparser/ i think this all is very very doable in DYea, I was pretty impressed with BSN. I definitely want to do something like that for Goldie, but I have a somewhat different idea in mind: I'm thinking of enhancing the grammar definition language so that all the information on how to construct an AST is right there in the grammar definition itself, and can thus be completely automated by Goldie. This would be in line with GOLD's philosophy and benefits of keeping the grammar definition separate from the processing code. And it would also be a step towards the idea I've had in mind since before Goldie was Goldie of being able to automate (or partially automate) generalized language translation/transformation.
Oct 26 2010
On 2010-10-26 04:44, Nick Sabalausky wrote:"Walter Bright"<newshound2 digitalmars.com> wrote in message news:ia59si$1r0j$1 digitalmars.com...I don't have much knowledge in this area but isn't this what a look-ahead is for? Just look ahead (hopefully) one character and decide what to convert to. -- /Jacob CarlborgConsider a string literal, say "abc\"def". With Goldie's method, I infer this string has to be scanned twice. Once to find its limits, and the second to convert it to the actual string.Yea, that is true. With that string in the input, the value given to the user code will be: assert(tokenObtainedFromGoldie.toString() == q{"abc\"def"}); That's a consequence of the grammar being separated from lexing/parsing implementation. You're right that that does seem less than ideal. Although I'm not sure how to remedy that without loosing the independence between grammar and lex/parse implementation that is the main point of the GOLD-based style. But there's something I don't quite understand about the approach you're suggesting: You seem to be suggesting that a terminal be progressively converted into its final form *as* it's still in the process of being recognized by the DFA. Which means, you don't know *what* you're supposed to be converting it into *while* you're converting it.
Oct 26 2010
Dnia 22-10-2010 o 21:48:49 Andrei Alexandrescu = <SeeWebsiteForEmail erdani.org> napisa=B3(a):On 10/22/10 14:02 CDT, Tomek Sowi=F1ski wrote:,Interesting idea. Here's another: D will soon need bindings for CORBA=dThrift, etc, so lexers will have to be written all over to grok interface files. Perhaps a generic tokenizer which can be parametrize=teswith a lexical grammar would bring more ROI, I got a hunch D's templa=are strong enough to pull this off without any source code generation=dala JavaCC. The books I read on compilers say tokenization is a solve=r =problem, so the theory part on what a good abstraction should be is done. What you think?Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenize=generator. I have in mind the entire implementation of a simple design, but never==had the time to execute on it. The tokenizer would work like this: alias Lexer!( "+", "PLUS", "-", "MINUS", "+=3D", "PLUS_EQ", ... "if", "IF", "else", "ELSE" ... ) DLexer;Yes. One remark: native language constructs scale better for a grammar: enum TokenDef : string { Digit =3D "[0-9]", Letter =3D "[a-zA-Z_]", Identifier =3D Letter~'('~Letter~'|'~Digit~')', ... Plus =3D "+", Minus =3D "-", PlusEq =3D "+=3D", ... If =3D "if", Else =3D "else", ... } alias Lexer!TokenDef DLexer; BTW, there's a bug related: http://d.puremagic.com/issues/show_bug.cgi?id=3D2950Such a declaration generates numeric values DLexer.PLUS etc. and =generates an efficient code that extracts a stream of tokens from a =stream of text. Each token in the token stream has the ID and the text=. All good ideas.Comments, strings etc. can be handled in one of several ways but that'=s =a longer discussion.The discussion's started anyhow. So what're the options? -- = Tomek
Oct 22 2010
On 22/10/2010 20:48, Andrei Alexandrescu wrote:On 10/22/10 14:02 CDT, Tomek Sowi艅ski wrote:Agreed, of all the things desired for D, a D tokenizer would rank pretty low I think. Another thing, even though a tokenizer generator would be much more desirable, I wonder if it is wise to have that in the standard library? It does not seem to be of wide enough interest to be in a standard library. (Out of curiosity, how many languages have such a thing in their standard library?) -- Bruno Medeiros - Software EngineerDnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com> napisa艂(a):Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer generator.As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?Interesting idea. Here's another: D will soon need bindings for CORBA, Thrift, etc, so lexers will have to be written all over to grok interface files. Perhaps a generic tokenizer which can be parametrized with a lexical grammar would bring more ROI, I got a hunch D's templates are strong enough to pull this off without any source code generation ala JavaCC. The books I read on compilers say tokenization is a solved problem, so the theory part on what a good abstraction should be is done. What you think?
Nov 19 2010
On Friday 19 November 2010 13:03:53 Bruno Medeiros wrote:On 22/10/2010 20:48, Andrei Alexandrescu wrote:esOn 10/22/10 14:02 CDT, Tomek Sowi=C5=84ski wrote:Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com> =20 napisa=C5=82(a):As we all know, tool support is important for D's success. Making tools easier to build will help with that. =20 To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. =20 It should: =20 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem =20 It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. =20 Anyone want to own this?=20 Interesting idea. Here's another: D will soon need bindings for CORBA, Thrift, etc, so lexers will have to be written all over to grok interface files. Perhaps a generic tokenizer which can be parametrized with a lexical grammar would bring more ROI, I got a hunch D's templat=We want to make it easy for tools to be built to work on and deal with D co= de.=20 An IDE, for example, needs to be able to tokenize and parse D code. A progr= am=20 like lint needs to be able to tokenize and parse D code. By providing a lex= er=20 and parser in the standard library, we are making it far easier for such to= ols=20 to be written, and they could be of major benefit to the D community. Sure,= the=20 average program won't need to lex or parse D, but some will, and making it = easy=20 to do will make it a lot easier for such programs to be written. =2D Jonathan M Davis=20 Agreed, of all the things desired for D, a D tokenizer would rank pretty low I think. =20 Another thing, even though a tokenizer generator would be much more desirable, I wonder if it is wise to have that in the standard library? It does not seem to be of wide enough interest to be in a standard library. (Out of curiosity, how many languages have such a thing in their standard library?)are strong enough to pull this off without any source code generation ala JavaCC. The books I read on compilers say tokenization is a solved problem, so the theory part on what a good abstraction should be is done. What you think?=20 Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer generator.
Nov 19 2010
On 19/11/2010 21:27, Jonathan M Davis wrote:On Friday 19 November 2010 13:03:53 Bruno Medeiros wrote:And by providing a lexer and a parser outside the standard library, wouldn't it make it just as easy for those tools to be written? What's the advantage of being in the standard library? I see only disadvantages: to begin with it potentially increases the time that Walter or other Phobos contributors may have to spend on it, even if it's just reviewing patches or making sure the code works. -- Bruno Medeiros - Software EngineerOn 22/10/2010 20:48, Andrei Alexandrescu wrote:We want to make it easy for tools to be built to work on and deal with D code. An IDE, for example, needs to be able to tokenize and parse D code. A program like lint needs to be able to tokenize and parse D code. By providing a lexer and parser in the standard library, we are making it far easier for such tools to be written, and they could be of major benefit to the D community. Sure, the average program won't need to lex or parse D, but some will, and making it easy to do will make it a lot easier for such programs to be written. - Jonathan M DavisOn 10/22/10 14:02 CDT, Tomek Sowi艅ski wrote:Agreed, of all the things desired for D, a D tokenizer would rank pretty low I think. Another thing, even though a tokenizer generator would be much more desirable, I wonder if it is wise to have that in the standard library? It does not seem to be of wide enough interest to be in a standard library. (Out of curiosity, how many languages have such a thing in their standard library?)Dnia 22-10-2010 o 00:01:21 Walter Bright<newshound2 digitalmars.com> napisa艂(a):Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer generator.As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?Interesting idea. Here's another: D will soon need bindings for CORBA, Thrift, etc, so lexers will have to be written all over to grok interface files. Perhaps a generic tokenizer which can be parametrized with a lexical grammar would bring more ROI, I got a hunch D's templates are strong enough to pull this off without any source code generation ala JavaCC. The books I read on compilers say tokenization is a solved problem, so the theory part on what a good abstraction should be is done. What you think?
Nov 19 2010
On Friday, November 19, 2010 13:53:12 Bruno Medeiros wrote:On 19/11/2010 21:27, Jonathan M Davis wrote:orOn Friday 19 November 2010 13:03:53 Bruno Medeiros wrote:On 22/10/2010 20:48, Andrei Alexandrescu wrote:On 10/22/10 14:02 CDT, Tomek Sowi=C5=84ski wrote:Dnia 22-10-2010 o 00:01:21 Walter Bright<newshound2 digitalmars.com> =20 napisa=C5=82(a):As we all know, tool support is important for D's success. Making tools easier to build will help with that. =20 To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. =20 It should: =20 1. support a range interface for its input, and a range interface f=A,its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem =20 It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. =20 Anyone want to own this?=20 Interesting idea. Here's another: D will soon need bindings for CORB=edThrift, etc, so lexers will have to be written all over to grok interface files. Perhaps a generic tokenizer which can be parametriz=erwith a lexical grammar would bring more ROI, I got a hunch D's templates are strong enough to pull this off without any source code generation ala JavaCC. The books I read on compilers say tokenization is a solved problem, so the theory part on what a good abstraction should be is done. What you think?=20 Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokeniz=tygenerator.=20 Agreed, of all the things desired for D, a D tokenizer would rank pret=If nothing, else, it makes it easier to keep in line with dmd itself. Since= the=20 dmd front end is LGPL, it's not possible to have a Boost port of it (like t= he=20 Phobos version will be) without Walter's consent. And I'd be surprised if h= e did=20 that for a third party library (though he seems to be pretty open on a lot = of=20 that kind of stuff). Not to mention, Walter and the core developers are _ex= actly_=20 the kind of people that you want working on a lexer or parser of the langua= ge=20 itself, because they're the ones who work on it. =2D Jonathan M Davis=20 And by providing a lexer and a parser outside the standard library, wouldn't it make it just as easy for those tools to be written? What's the advantage of being in the standard library? I see only disadvantages: to begin with it potentially increases the time that Walter or other Phobos contributors may have to spend on it, even if it's just reviewing patches or making sure the code works.low I think. =20 Another thing, even though a tokenizer generator would be much more desirable, I wonder if it is wise to have that in the standard library? It does not seem to be of wide enough interest to be in a standard library. (Out of curiosity, how many languages have such a thing in their standard library?)=20 We want to make it easy for tools to be built to work on and deal with D code. An IDE, for example, needs to be able to tokenize and parse D code. A program like lint needs to be able to tokenize and parse D code. By providing a lexer and parser in the standard library, we are making it far easier for such tools to be written, and they could be of major benefit to the D community. Sure, the average program won't need to lex or parse D, but some will, and making it easy to do will make it a lot easier for such programs to be written. =20 - Jonathan M Davis
Nov 19 2010
On 19/11/2010 22:02, Jonathan M Davis wrote:On Friday, November 19, 2010 13:53:12 Bruno Medeiros wrote:Eh? That license argument doesn't make sense: if the lexer and parser were to be based on DMD itself, then putting it in the standard library is equivalent (in licensing terms) to licensing the lexer and parser parts of DMD in Boost. More correctly, what I mean by equivalent, is that there no reason why Walter would allow one thing and not the other... (because on both cases he would have to issue that license) As for your second argument, yes, Walter and the core developers would be the most qualified people to work in it, no question about it. But my point is, I don't think Walter and Phobos core devs should be working on it, because it takes time away from other things that are much more important. Their time is precious. I think our main point of disagreement is just how important a D lexer and/or parser would be. I think it would be of very low interest, definitely not a "major benefit to the D community". For starters, regarding its use in IDEs: I think we are *ages* away from the point were an IDE based on D only will be able to compete with IDEs based in Eclipse/Visual-Studio/Xcode/etc.. I think much sooner we will have a full D compiler written in D than a (competitive) D IDE written in D. We barely have mature GUI libraries from what I understand. (What may be more realistic is an IDE partially written in D, and otherwise based on Eclipse/Visual-Studio/etc., but even so, I think it would be hard to compete with other non-D IDEs) -- Bruno Medeiros - Software EngineerOn 19/11/2010 21:27, Jonathan M Davis wrote: And by providing a lexer and a parser outside the standard library, wouldn't it make it just as easy for those tools to be written? What's the advantage of being in the standard library? I see only disadvantages: to begin with it potentially increases the time that Walter or other Phobos contributors may have to spend on it, even if it's just reviewing patches or making sure the code works.If nothing, else, it makes it easier to keep in line with dmd itself. Since the dmd front end is LGPL, it's not possible to have a Boost port of it (like the Phobos version will be) without Walter's consent. And I'd be surprised if he did that for a third party library (though he seems to be pretty open on a lot of that kind of stuff). Not to mention, Walter and the core developers are _exactly_ the kind of people that you want working on a lexer or parser of the language itself, because they're the ones who work on it. - Jonathan M Davis
Nov 19 2010
== Quote from Bruno Medeiros (brunodomedeiros+spam com.gmail)'s articleI think much sooner we will have a full D compiler written in D than a (competitive) D IDE written in D.I agree. I do like the suggestion for developing the D grammar in Antlr though and it is something I would be interested in working on. With this in hand, the prospect of adding D support as was done for C++ to Eclipse or Netbeans becomes much more feasible. Has a complete grammar been defined/compiled or is anyone currently working in this direction? Having a robust IDE seems far more important than whether it is written in D itself.
Nov 19 2010
On 19/11/2010 23:45, Todd VanderVeen wrote:== Quote from Bruno Medeiros (brunodomedeiros+spam com.gmail)'s articleSee the comment I made below, to Michael Stover. ( news://news.digitalmars.com:119/ic71pa$1lev$1 digitalmars.com ) -- Bruno Medeiros - Software EngineerI think much sooner we will have a full D compiler written in D than a (competitive) D IDE written in D.I agree. I do like the suggestion for developing the D grammar in Antlr though and it is something I would be interested in working on. With this in hand, the prospect of adding D support as was done for C++ to Eclipse or Netbeans becomes much more feasible. Has a complete grammar been defined/compiled or is anyone currently working in this direction? Having a robust IDE seems far more important than whether it is written in D itself.
Nov 19 2010
On Friday, November 19, 2010 15:17:35 Bruno Medeiros wrote:On 19/11/2010 22:02, Jonathan M Davis wrote:It's very different to have D implementation of something - which is based on a C++ version but definitely different in some respects - be under Boost and generally available, and having the C++ implementation be under Boost - particularly when the C++ version covers far more than just a lexer and parser. Someone _could_ port the D code back to C++ and have that portion useable under Boost, but that's a lot more work than just taking the C++ code and using it, and it's only the portions of the compiler which were ported to D to which could be re-used that way. And since the Boost code could be used in a commercial product while the LGPL is more restricted, it could make a definite difference. I'm not a licensing expert, and I'm not an expert on what Walter does and doesn't want done with his code, but he put the compiler front end under the LGPL, not Boost, and he's given his permission to have the lexer alone ported to D and put under the Boost license in the standard library, which is very different from putting the entire front end under Boost. I expect that the parser will follow eventually, but even if it does, that's still not the entire front end. So, there is a difference in licenses does have a real impact. And no one can take the LGPL C++ code and port it to D - for the standard library or otherwise - without Walter's permission, because its his copyright on the code. As for the usefulness of a D lexer and parser, I've already had several programs or functions which I've wanted to write which would require it, and the lack has made them infeasible. For instance, I was considering posting a version of my datetime code without the unit tests in it, so that it would be easier to read the actual code (given the large number of unit tests), but I found that to accurately do that, you need a lexer for D, so I gave up on it for the time being. Having a function which stripped out unnecessary whitespace (and especially newlines) for string mixins would be great (particularly since line numbers get messed up with multi-line string mixins), but that would require CTFE-able D lexer to work correctly (though you might be able to hack together something which would mostly work), which we don't have. The D lexer won't be CTFE-able initially (though hopefully it will be once the CTFE capabilites of dmd improve), so you still won't be able to do that once the lexer is done, but it is a case where a lexer would be useful. are huge. It will take time to get there, and we'll need more developers, but I don't think that it really makes sense to not put things in the standard library because it might take more dev time - particularly when a D lexer is the sort of thing that likely won't need much changing once it's done, since it would only need to be changed when the language changed or when a bug with it was found (which would likely equate to a bug in the compiler anyway), so ultimately, the developer cost is likely fairly low. Additionally, Walter thinks that the development costs will be lower to have it in the standard library with an implementation similar to dmd's rather than having it separate. And it's his call. So, it's going to get done. There are several people around here who lament the lack of D parser in Phobos at least periodically, and I think that it will be good to have an appropriate lexer and parser for D in Phobos. Having other 3rd party stuff - like antlr - is great too, but that's no reason not to put it in the standard library. I think that we're just going to have to agree to disagree on this one. - Jonathan M DavisOn Friday, November 19, 2010 13:53:12 Bruno Medeiros wrote:Eh? That license argument doesn't make sense: if the lexer and parser were to be based on DMD itself, then putting it in the standard library is equivalent (in licensing terms) to licensing the lexer and parser parts of DMD in Boost. More correctly, what I mean by equivalent, is that there no reason why Walter would allow one thing and not the other... (because on both cases he would have to issue that license)On 19/11/2010 21:27, Jonathan M Davis wrote: And by providing a lexer and a parser outside the standard library, wouldn't it make it just as easy for those tools to be written? What's the advantage of being in the standard library? I see only disadvantages: to begin with it potentially increases the time that Walter or other Phobos contributors may have to spend on it, even if it's just reviewing patches or making sure the code works.If nothing, else, it makes it easier to keep in line with dmd itself. Since the dmd front end is LGPL, it's not possible to have a Boost port of it (like the Phobos version will be) without Walter's consent. And I'd be surprised if he did that for a third party library (though he seems to be pretty open on a lot of that kind of stuff). Not to mention, Walter and the core developers are _exactly_ the kind of people that you want working on a lexer or parser of the language itself, because they're the ones who work on it. - Jonathan M Davis
Nov 19 2010
On 20/11/2010 01:29, Jonathan M Davis wrote:On Friday, November 19, 2010 15:17:35 Bruno Medeiros wrote:There are some misunderstandings here. First, the DMD front-end is licenced under the GPL, not LGPL. Second, more importantly, it is actually also licensed under the Artistic license, a very permissible license. This is the basis for me stating that almost certainly Walter would not mind licensing the DMD parser and lexer under Boost, as it's actually not that different from the Artistic license.On 19/11/2010 22:02, Jonathan M Davis wrote:It's very different to have D implementation of something - which is based on a C++ version but definitely different in some respects - be under Boost and generally available, and having the C++ implementation be under Boost - particularly when the C++ version covers far more than just a lexer and parser. Someone _could_ port the D code back to C++ and have that portion useable under Boost, but that's a lot more work than just taking the C++ code and using it, and it's only the portions of the compiler which were ported to D to which could be re-used that way. And since the Boost code could be used in a commercial product while the LGPL is more restricted, it could make a definite difference. I'm not a licensing expert, and I'm not an expert on what Walter does and doesn't want done with his code, but he put the compiler front end under the LGPL, not Boost, and he's given his permission to have the lexer alone ported to D and put under the Boost license in the standard library, which is very different from putting the entire front end under Boost. I expect that the parser will follow eventually, but even if it does, that's still not the entire front end. So, there is a difference in licenses does have a real impact. And no one can take the LGPL C++ code and port it to D - for the standard library or otherwise - without Walter's permission, because its his copyright on the code.On Friday, November 19, 2010 13:53:12 Bruno Medeiros wrote:Eh? That license argument doesn't make sense: if the lexer and parser were to be based on DMD itself, then putting it in the standard library is equivalent (in licensing terms) to licensing the lexer and parser parts of DMD in Boost. More correctly, what I mean by equivalent, is that there no reason why Walter would allow one thing and not the other... (because on both cases he would have to issue that license)On 19/11/2010 21:27, Jonathan M Davis wrote: And by providing a lexer and a parser outside the standard library, wouldn't it make it just as easy for those tools to be written? What's the advantage of being in the standard library? I see only disadvantages: to begin with it potentially increases the time that Walter or other Phobos contributors may have to spend on it, even if it's just reviewing patches or making sure the code works.If nothing, else, it makes it easier to keep in line with dmd itself. Since the dmd front end is LGPL, it's not possible to have a Boost port of it (like the Phobos version will be) without Walter's consent. And I'd be surprised if he did that for a third party library (though he seems to be pretty open on a lot of that kind of stuff). Not to mention, Walter and the core developers are _exactly_ the kind of people that you want working on a lexer or parser of the language itself, because they're the ones who work on it. - Jonathan M Davisare huge. It will take time to get there, and we'll need more developers, but Ibigger than Phobos, and yet they have no functionality for lexing/parsing their own languages (or any other for that matter)! -- Bruno Medeiros - Software Engineer
Nov 24 2010
On 11/19/10 1:03 PM, Bruno Medeiros wrote:On 22/10/2010 20:48, Andrei Alexandrescu wrote:Even C has strtok. AndreiOn 10/22/10 14:02 CDT, Tomek Sowi艅ski wrote:Agreed, of all the things desired for D, a D tokenizer would rank pretty low I think. Another thing, even though a tokenizer generator would be much more desirable, I wonder if it is wise to have that in the standard library? It does not seem to be of wide enough interest to be in a standard library. (Out of curiosity, how many languages have such a thing in their standard library?)Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com> napisa艂(a):Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer generator.As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?Interesting idea. Here's another: D will soon need bindings for CORBA, Thrift, etc, so lexers will have to be written all over to grok interface files. Perhaps a generic tokenizer which can be parametrized with a lexical grammar would bring more ROI, I got a hunch D's templates are strong enough to pull this off without any source code generation ala JavaCC. The books I read on compilers say tokenization is a solved problem, so the theory part on what a good abstraction should be is done. What you think?
Nov 19 2010
On 19/11/2010 23:39, Andrei Alexandrescu wrote:On 11/19/10 1:03 PM, Bruno Medeiros wrote:That's just a fancy splitter, I wouldn't call that a proper tokenizer. I meant something that, at the very least, would tokenize based on regular expressions (and have heterogenous tokens). -- Bruno Medeiros - Software EngineerOn 22/10/2010 20:48, Andrei Alexandrescu wrote:Even C has strtok. AndreiOn 10/22/10 14:02 CDT, Tomek Sowi艅ski wrote:Agreed, of all the things desired for D, a D tokenizer would rank pretty low I think. Another thing, even though a tokenizer generator would be much more desirable, I wonder if it is wise to have that in the standard library? It does not seem to be of wide enough interest to be in a standard library. (Out of curiosity, how many languages have such a thing in their standard library?)Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com> napisa艂(a):Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer generator.As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?Interesting idea. Here's another: D will soon need bindings for CORBA, Thrift, etc, so lexers will have to be written all over to grok interface files. Perhaps a generic tokenizer which can be parametrized with a lexical grammar would bring more ROI, I got a hunch D's templates are strong enough to pull this off without any source code generation ala JavaCC. The books I read on compilers say tokenization is a solved problem, so the theory part on what a good abstraction should be is done. What you think?
Nov 24 2010
On 24/11/2010 13:30, Bruno Medeiros wrote:On 19/11/2010 23:39, Andrei Alexandrescu wrote:In other words, a lexer, that might be a better term in this context. -- Bruno Medeiros - Software EngineerOn 11/19/10 1:03 PM, Bruno Medeiros wrote:That's just a fancy splitter, I wouldn't call that a proper tokenizer. I meant something that, at the very least, would tokenize based on regular expressions (and have heterogenous tokens).On 22/10/2010 20:48, Andrei Alexandrescu wrote:Even C has strtok. AndreiOn 10/22/10 14:02 CDT, Tomek Sowi艅ski wrote:Agreed, of all the things desired for D, a D tokenizer would rank pretty low I think. Another thing, even though a tokenizer generator would be much more desirable, I wonder if it is wise to have that in the standard library? It does not seem to be of wide enough interest to be in a standard library. (Out of curiosity, how many languages have such a thing in their standard library?)Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com> napisa艂(a):Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer generator.As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?Interesting idea. Here's another: D will soon need bindings for CORBA, Thrift, etc, so lexers will have to be written all over to grok interface files. Perhaps a generic tokenizer which can be parametrized with a lexical grammar would bring more ROI, I got a hunch D's templates are strong enough to pull this off without any source code generation ala JavaCC. The books I read on compilers say tokenization is a solved problem, so the theory part on what a good abstraction should be is done. What you think?
Nov 24 2010
Walter:As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers.This is a quite long talk by Steve Yegge that I've just seen (linked from Reddit): http://vimeo.com/16069687 I don't suggest you to see it all unless you are very interested in that topic. But the most important thing it says is that, given that big software companies use several languages, and programmers often don't want to change their preferred IDE, there is a problem: given N languages and M editors/IDEs, total toolchain effort is N * M. That means N syntax highlighters, N indenters, N refactoring suites, etc. Result: most languages have bad toolchains and most IDEs manage very well only one or very few languages. So he has suggested the Grok project, that allows to reduce the toolchain effort to N + M. Each language needs to have one of each service: indenter, highlighter, name resolver, refactory, etc. So each IDE may link (using a standard interface provided by Grok) to those services and use them. Today Grok is not available yet, and its development is at the first stages, but after this talk I think that it may be positive to add to Phobos not just the D lexer, but also other things, even a bit higher level as an indenter, highlighter, name resolver, refactory, etc. Even if they don't use the standard universal interface used by Grok I think they may speed up the development of the D toolchain. Bye, bearophile
Oct 23 2010
This is a quite long talk by Steve Yegge that I've just seen (linked from Reddit): http://vimeo.com/16069687Sorry, the Reddit thread: http://www.reddit.com/r/programming/comments/dvd9x/steve_yegge_on_scalable_programming_language/
Oct 23 2010
"bearophile" <bearophileHUGS lycos.com> wrote in message news:i9vs3v$142e$1 digitalmars.com...Walter:I haven't looked at the video, but that sounds like the direction I've had in mind for Goldie.As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers.This is a quite long talk by Steve Yegge that I've just seen (linked from Reddit): http://vimeo.com/16069687 I don't suggest you to see it all unless you are very interested in that topic. But the most important thing it says is that, given that big software companies use several languages, and programmers often don't want to change their preferred IDE, there is a problem: given N languages and M editors/IDEs, total toolchain effort is N * M. That means N syntax highlighters, N indenters, N refactoring suites, etc. Result: most languages have bad toolchains and most IDEs manage very well only one or very few languages. So he has suggested the Grok project, that allows to reduce the toolchain effort to N + M. Each language needs to have one of each service: indenter, highlighter, name resolver, refactory, etc. So each IDE may link (using a standard interface provided by Grok) to those services and use them. Today Grok is not available yet, and its development is at the first stages, but after this talk I think that it may be positive to add to Phobos not just the D lexer, but also other things, even a bit higher level as an indenter, highlighter, name resolver, refactory, etc. Even if they don't use the standard universal interface used by Grok I think they may speed up the development of the D toolchain.
Oct 23 2010
On 24/10/2010 00:46, bearophile wrote:Walter:Hum, very interesting topic! A few disjoint comments: (*) I'm glad to see another person, especially one who is "prominent" in the development community (like Andrei), discuss the importance of the toolchain, specificaly IDEs, for emerging languages. Or for any language for that matter. At the beggining of the talk I was like "man, this is spot-on, that's what I've said before, I wish Walter would *hear* this"! LOL, imagine my surprise when I found that Walter was in fact *there*! (When I saw the talk I didn't even know this was at NWCPP, otherwise I might have suspected) (*) I actually thought about some similar ideas before, for example, I thought about the idea of exposing some (if not all) of the functionality of DDT through the command-line (note that Eclipse can run headless, without any UI). And this would not be just semantic/indexer functionality, so for example: * DDoc generation, like Descent had at some point (http://www.mail-archive.com/digitalmars-d-announce puremagic.com/msg02734.html) * build functionality - only really interesting if the DDT builder becomes smarter, ie, does more useful stuff than what it does now. * semantic functionality: find-ref, code completion. (*) I wished I was at that talk, I would have liked to ask and discuss some things with Steve Yegge, particularly his comments about Eclipse's indexer. I become curious for details about what he thinks is wrong about Eclipse's indexer. Also, I wonder if he's not conflating "CDT's indexer" with "Eclipse indexer", because actually there is no such thing as a "Eclipse indexer". I'm gonna take a better look at the comments for this one. (*) As for Grok itself, it looks potentially interesting, but I still have only a very vague impression of what it does (let alone *how*). -- Bruno Medeiros - Software EngineerAs we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers.This is a quite long talk by Steve Yegge that I've just seen (linked from Reddit): http://vimeo.com/16069687 I don't suggest you to see it all unless you are very interested in that topic. But the most important thing it says is that, given that big software companies use several languages, and programmers often don't want to change their preferred IDE, there is a problem: given N languages and M editors/IDEs, total toolchain effort is N * M. That means N syntax highlighters, N indenters, N refactoring suites, etc. Result: most languages have bad toolchains and most IDEs manage very well only one or very few languages. So he has suggested the Grok project, that allows to reduce the toolchain effort to N + M. Each language needs to have one of each service: indenter, highlighter, name resolver, refactory, etc. So each IDE may link (using a standard interface provided by Grok) to those services and use them. Today Grok is not available yet, and its development is at the first stages, but after this talk I think that it may be positive to add to Phobos not just the D lexer, but also other things, even a bit higher level as an indenter, highlighter, name resolver, refactory, etc. Even if they don't use the standard universal interface used by Grok I think they may speed up the development of the D toolchain. Bye, bearophile
Nov 24 2010
On 24/10/2010 00:46, bearophile wrote:be used in this way. The Eclipse plugin for Scala (and I assume the Netbeans and IDEA plugins work similarly) is really just a wrapper around the compiler because the compiler can be used as a library, allowing a rich IDE with minimal effort because rather than implementing parsing and semantic analysis, the IDE team can just query the compiler's data structures.Walter: As we all know, tool support is important for D's success. Making toolsFrom watching this, I'm reminded that in the Scala world, the compiler caneasier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers.This is a quite long talk by Steve Yegge that I've just seen (linked from Reddit): http://vimeo.com/16069687 I don't suggest you to see it all unless you are very interested in that topic. But the most important thing it says is that, given that big software companies use several languages, and programmers often don't want to change their preferred IDE, there is a problem: given N languages and M editors/IDEs, total toolchain effort is N * M. That means N syntax highlighters, N indenters, N refactoring suites, etc. Result: most languages have bad toolchains and most IDEs manage very well only one or very few languages. So he has suggested the Grok project, that allows to reduce the toolchain effort to N + M. Each language needs to have one of each service: indenter, highlighter, name resolver, refactory, etc. So each IDE may link (using a standard interface provided by Grok) to those services and use them. Today Grok is not available yet, and its development is at the first stages, but after this talk I think that it may be positive to add to Phobos not just the D lexer, but also other things, even a bit higher level as an indenter, highlighter, name resolver, refactory, etc. Even if they don't use the standard universal interface used by Grok I think they may speed up the development of the D toolchain. Bye, bearophile
Nov 24 2010
On 24/11/2010 18:48, Andrew Wiley wrote:On 24/10/2010 00:46, bearophile wrote: Walter: As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. This is a quite long talk by Steve Yegge that I've just seen (linked from Reddit): http://vimeo.com/16069687 I don't suggest you to see it all unless you are very interested in that topic. But the most important thing it says is that, given that big software companies use several languages, and programmers often don't want to change their preferred IDE, there is a problem: given N languages and M editors/IDEs, total toolchain effort is N * M. That means N syntax highlighters, N indenters, N refactoring suites, etc. Result: most languages have bad toolchains and most IDEs manage very well only one or very few languages. So he has suggested the Grok project, that allows to reduce the toolchain effort to N + M. Each language needs to have one of each service: indenter, highlighter, name resolver, refactory, etc. So each IDE may link (using a standard interface provided by Grok) to those services and use them. Today Grok is not available yet, and its development is at the first stages, but after this talk I think that it may be positive to add to Phobos not just the D lexer, but also other things, even a bit higher level as an indenter, highlighter, name resolver, refactory, etc. Even if they don't use the standard universal interface used by Grok I think they may speed up the development of the D toolchain. Bye, bearophile From watching this, I'm reminded that in the Scala world, the compiler can be used in this way. The Eclipse plugin for Scala (and I assume the Netbeans and IDEA plugins work similarly) is really just a wrapper around the compiler because the compiler can be used as a library, allowing a rich IDE with minimal effort because rather than implementing parsing and semantic analysis, the IDE team can just query the compiler's data structures.Interesting, very wise of them to do that. But not very surprising, Scala is close to the Java world, so they (the Scala people) must have known how important it would be to have the best toolchain possible, in order to compete (with Java, JDT, also Visual Studio, etc.). -- Bruno Medeiros - Software Engineer
Nov 25 2010
"Walter Bright" <newshound2 digitalmars.com> wrote in message news:i9qd8q$1ls4$1 digitalmars.com...4. the tokens should be a value type, not a reference typeI'm curious, is your reason for this purely to avoid allocations during lexing, or are there other reasons too? If it's mainly to avoid allocations during lexing then, maybe I've understood wrong, but isn't D2 getting the ability to construct class objects in-place into pre-allocated memory? (or already has the ability?) If so, do you think just creating the tokens that way would likely be close enough?
Oct 26 2010
Nick Sabalausky wrote:"Walter Bright" <newshound2 digitalmars.com> wrote in message news:i9qd8q$1ls4$1 digitalmars.com...It's one big giant reason. Storage allocation gets unbelievably costly in a lexer. Another is it makes tokens easy to copy. Another one is that classes are for polymorphic behavior. What kind of polymorphic behavior would one want with tokens?4. the tokens should be a value type, not a reference typeI'm curious, is your reason for this purely to avoid allocations during lexing, or are there other reasons too?If it's mainly to avoid allocations during lexing then, maybe I've understood wrong, but isn't D2 getting the ability to construct class objects in-place into pre-allocated memory?If you do that, might as well make them value types. The only reason classes exist is to support runtime polymorphism. C++ made a vast mistake in failing to distinguish between value types and reference types. Java made a related mistake by failing to acknowledge that value types have any useful purpose at all (unless they are built-in).
Oct 26 2010
Walter:Java made a related mistake by failing to acknowledge that value types have any useful purpose at all (unless they are built-in).Java was designed to be simple! Simple means to have a more uniform semantics. Removing value types was a good idea if you want to simplify a language (and remove a mountain of details from C++). And from the huge success of Java, I a more complex than Java). The Java VM also is now often able to allocate not escaping objects on the stack (escape analysis) regaining some of the lost performance. What I miss more in Java is not single structs (single values), but a way to build an array of values (structs). Because using parallel arrays is not nice at all. Even in Python using numPy you may create an array of structs (compound value items). Bye, bearophile
Oct 26 2010
bearophile wrote:Walter:So was Pascal. See the thread about how useless it was as a result. A hatchet is a very simple tool, easy to understand, and I could build a house with nothing but a hatchet. But it would make the house several times as expensive to build, and it would look like it was built with a hatchet.Java made a related mistake by failing to acknowledge that value types have any useful purpose at all (unless they are built-in).Java was designed to be simple! Simple means to have a more uniform semantics.Removing value types was a good idea if you want to simplify a language (and remove a mountain of details from C++). And from the huge often able to allocate not escaping objects on the stack (escape analysis) regaining some of the lost performance.The issue isn't just about lost performance. It's about proper encapsulation of a type. Value types and polymorphic types are different, have different purposes, different behaviors, etc. Conflating the two into the same construct makes for poor and confusing abstractions. It shifts the problem out of the language and onto the programmer. It does NOT make the complexity go away.What I miss more in Java is not single structs (single values),There's a lot more to miss than that. I find Java code tends to be excessively complex, and that's because it lacks expressive power. It was summed up for me by a colleague who said that one needs an IDE to program in Java because with one button it will auto-generate 100 lines of boilerplate.
Oct 26 2010
Tue, 26 Oct 2010 21:39:32 -0700, Walter Bright wrote:bearophile wrote:Blablabla.. this nostalgic lesson reminded me, have you even started studying the list of type system concepts I listed few days ago. A new version with links is coming at some point of time.Walter:So was Pascal. See the thread about how useless it was as a result.Java made a related mistake by failing to acknowledge that value types have any useful purpose at all (unless they are built-in).Java was designed to be simple! Simple means to have a more uniform semantics.Adding structs to Java wouldn't fix that. You probably know that. Unifying structs and classes in a language like D and adding good escape analysis wouldn't worsen the performance that badly in general purpose applications. Java is mostly used for general purpose programming so your claims about usefulness and the need for extreme performance look silly.What I miss more in Java is not single structs (single values),There's a lot more to miss than that. I find Java code tends to be excessively complex, and that's because it lacks expressive power.
Oct 27 2010
retard wrote:have you even started studying the list of type system concepts I listed few days ago.Java has proved that such things aren't useful in programming languages :-)Adding structs to Java wouldn't fix that. You probably know that. Unifying structs and classes in a language like D and adding good escape analysis wouldn't worsen the performance that badly in general purpose applications. Java is mostly used for general purpose programming so your claims about usefulness and the need for extreme performance look silly.If that were true, why are Java char/int/double types value types, not a reference type derived from Object?
Oct 27 2010
Walter:So was Pascal. See the thread about how useless it was as a result.But Java is probably currently the most used language, so I guess they have created a simpler language, but not too much simple as Pascal was.Value types and polymorphic types are different, have differentpurposes, different behaviors, etc. Right.Conflating the two into the same construct makes for poor and confusing abstractions.<In Python there are (more or less) only objects, and they are managed "by name" (similar a "by reference") and it works well enough.It shifts the problem out of the language and onto the programmer. It does NOT make the complexity go away.<This is partially true. The presence of just objects doesn't solve all problems, so part of the complexity doesn't go away, it goes into the program. On the other hand value semantics introduces a large amount of complexity by itself (in C++ there is a huge amount of stuff about this semantics, and even in D the design is unfinished still after ten years and after all the experience with C++). So in my opinion in the end the net result is that removing structs makes the language+programs simpler.There's a lot more to miss than that. I find Java code tends to be excessively complex, and that's because it lacks expressive power. It was summed up for me by a colleague who said that one needs an IDE to program in Java because with one button it will auto-generate 100 lines of boilerplate.Yes, clearly Java has several faults. It's far from perfect. But introducing structs inside Java is in my opinion not going to solve those problems much.[...] If that were true, why are Java char/int/double types value types, not a reference type derived from Object?For performance reasons, because originally Java didn't have the advanced compilation strategies used today. Languages like Clojure that run on the JavaVM use more reference types (for integer numbers too). After all this discussion I want to remind you that I am here because I like D and I like D structs, unions and all that :-) I prefer to use D many times over Java. And I agree that structs (or tagged unions) are better in D for the lexer if you want the lexer to be quite fast. Bye, bearophile
Oct 27 2010
bearophile wrote:After all this discussion I want to remind you that I am here because I like D and I like D structs, unions and all that :-) I prefer to use D many times over Java. And I agree that structs (or tagged unions) are better in D for the lexer if you want the lexer to be quite fast.So, there is "value" in value types after all. I confess I have no idea why you argue against them.
Oct 27 2010
Walter:So, there is "value" in value types after all. I confess I have no idea why you argue against them.I am not arguing against them in absolute. They are good in some situations and not so good in other situations :-) Compound value types are very useful in a certain imperative low-level language. While if you are designing a simpler language, it's better to not add structs to it (and yes, in practice the world needs simpler languages too, not everyone needs a Ferrari in every moment. And I believe that removing structs makes on average simpler the sum of the language+its programs). Bye, bearophile
Oct 27 2010
On 27/10/2010 05:39, Walter Bright wrote:I've been hearing that a lot, but I find this to be excessively exaggerated. Can you give some concrete examples? Because regarding excessive verbosity in Java, I cab only remember tree significant things at the moment (at least disregarding meta programming), and one of them is nearly as verbose in D as in Java: 1) writing getters and setters for fields 2) verbose syntax for closures. (need to use an anonymous class, outer variables must be final, and wrapped in an array if write access is needed) 3) writing trivial constructors whose parameters mirror the fields, and then constructors assign the parameters to the fields. I don't think 1 and 2 happens that often to be that much of an annoyance. (unless you're one of those Java persons that thinks that directly accessing the public field of another class is a sin, instead every single field must have getters/setters and never ever be public...) As an additional note, I don't think having an IDE auto-generate X lines of boilerplate code is necessarily a bad thing. It's only bad if the alternative of having a better language feature would actually save me coding time (whether initial coding, or subsequent modifications) or improve code understanding. _Isn't this what matters?_ -- Bruno Medeiros - Software EngineerWhat I miss more in Java is not single structs (single values),There's a lot more to miss than that. I find Java code tends to be excessively complex, and that's because it lacks expressive power. It was summed up for me by a colleague who said that one needs an IDE to program in Java because with one button it will auto-generate 100 lines of boilerplate.
Nov 19 2010
On 27/10/2010 05:39, Walter Bright wrote:bearophile wrote:There's good simple, and there's bad simple... -- Bruno Medeiros - Software EngineerWalter: Java was designed to be simple! Simple means to have a more uniform semantics.So was Pascal. See the thread about how useless it was as a result.
Nov 19 2010
"Walter Bright" <newshound2 digitalmars.com> wrote in message news:ia8321$vug$1 digitalmars.com...Nick Sabalausky wrote:Honestly, I'm not entirely certain whether or not Goldie actually needs its tokens to be classes instead of structs, but I'll explain my current usage: In the basic "dynamic" style, every token in Goldie, terminal or nonterminal, is of type "class Token" no matter what its symbol or part of grammar it represents. But Goldie has an optional "static" style which creates a class hierarchy, for the sake of compile-time type safety, with "Token" as the base. Suppose you have a grammar named "simple" that's a series of one or more a's and b's separated by plus signs: <Item> ::= 'a' | 'b' <List> ::= <List> '+' <Item> | <Item> So there's three terminals: "a", "b", and "+" And two nonterminals: "<Item>" and "<List>", each with exactly two possible reductions. So in dynamic-style, all of those are "class Token", and that's all that exists. But with the optional static-style, the following class hierarchy is effectively created: class Token; class Token_simple : Token; class Token_simple!"a" : Token_simple; class Token_simple!"b" : Token_simple; class Token_simple!"+" : Token_simple; class Token_simple!"<Item>" : Token_simple; class Token_simple!"<List>" : Token_simple; class Token_simple!("<Item>", "a") : Token_simple!"<Item>"; class Token_simple!("<Item>", "b") : Token_simple!"<Item>"; class Token_simple!("<List>", "<List>", "+", "<Item>") : Token_simple!"<List>"; class Token_simple!("<List>", "<Item>") : Token_simple!"<List>"; So rules inherit from the nonterminal they reduce to; terminals and nonterminals inherit from a dummy class dedicated specifically to the given grammar; and that inherits from plain old dynamic-style "Token". All of those template parameters are validated at compile-time. (At some point I'd like to make it possible to specify the rule-based tokens as something like: Token!("<List> ::= <List> '+' <Item>"), but I haven't gotten to it yet.) There's one more trick: The plain old Token exposes a member "subX" which can be numerically indexed to obtain the sub-tokens (for terminals, subX.length==0): void foo(Token token) { if(token.matches("<List>", "<List>", "+", "<Item>")) { auto leftSide = token.subX[0]; auto rightSide = token.subX[2]; //auto dummy = token.subX[10]; // Run-time error static assert(is(typeof(leftSide) == Token)); static assert(is(typeof(rightSide) == Token)); } } Note that it's impossible for the "matches" function to verify at compile-time that its arguments are valid. All of the static-style token types retain all of that for total compatibility with the dynamic-style. But the static-style nonterminals provide an additional member, "sub", that can be used like this: void foo(Token_simple!("<List>", "<List>", "+", "<Item>") token) { auto leftSide = token.sub!0; auto rightSide = token.sub!2; //auto dummy = token.sub!10; // Compile-time error static assert(is(typeof(leftSide) == Token_simple!"<List>")); static assert(is(typeof(rightSide) == Token_simple!"<Item>")); } As for whether or not this effect can be reasonably accomplished with structs: I have no idea, I haven't really looked into it."Walter Bright" <newshound2 digitalmars.com> wrote in message news:i9qd8q$1ls4$1 digitalmars.com...It's one big giant reason. Storage allocation gets unbelievably costly in a lexer. Another is it makes tokens easy to copy. Another one is that classes are for polymorphic behavior. What kind of polymorphic behavior would one want with tokens?4. the tokens should be a value type, not a reference typeI'm curious, is your reason for this purely to avoid allocations during lexing, or are there other reasons too?
Oct 26 2010
Nick Sabalausky wrote:As for whether or not this effect can be reasonably accomplished with structs: I have no idea, I haven't really looked into it.I use a tagged variant for the token struct. This doesn't make any difference if one is parsing small pieces of code. But when you're trying to stuff millions of lines of code down its maw, avoiding an allocation per token is a big deal. The indirect calls to the member functions of a class also perform poorly relative to tagged variants.
Oct 26 2010
Tue, 26 Oct 2010 19:32:44 -0700, Walter Bright wrote:Nick Sabalausky wrote:This is why the basic data structure in functional languages, algebraic data types, suits better for this purpose."Walter Bright" <newshound2 digitalmars.com> wrote in message news:i9qd8q$1ls4$1 digitalmars.com...It's one big giant reason. Storage allocation gets unbelievably costly in a lexer. Another is it makes tokens easy to copy. Another one is that classes are for polymorphic behavior. What kind of polymorphic behavior would one want with tokens?4. the tokens should be a value type, not a reference typeI'm curious, is your reason for this purely to avoid allocations during lexing, or are there other reasons too?
Oct 27 2010
retard wrote:This is why the basic data structure in functional languages, algebraic data types, suits better for this purpose.I think you recently demonstrated otherwise, as proven by the widespread use of Java :-)
Oct 27 2010
Wed, 27 Oct 2010 12:08:19 -0700, Walter Bright wrote:retard wrote:I don't understand your logic -- Widespread use of Java proves that algebraic data types aren't a better suited way for expressing compiler's data structures such as syntax trees?This is why the basic data structure in functional languages, algebraic data types, suits better for this purpose.I think you recently demonstrated otherwise, as proven by the widespread use of Java :-)
Oct 27 2010
retard wrote:Wed, 27 Oct 2010 12:08:19 -0700, Walter Bright wrote:You told me that widespread use of Java proved that nothing more complex than what Java provides is useful: "Java is mostly used for general purpose programming so your claims about usefulness and the need for extreme performance look silly." I'd be surprised if you seriously meant that, as it implies that Java is the pinnacle of computer language design, but I can't resist teasing you about it. :-)retard wrote:I don't understand your logic -- Widespread use of Java proves that algebraic data types aren't a better suited way for expressing compiler's data structures such as syntax trees?This is why the basic data structure in functional languages, algebraic data types, suits better for this purpose.I think you recently demonstrated otherwise, as proven by the widespread use of Java :-)
Oct 27 2010
Wed, 27 Oct 2010 13:52:29 -0700, Walter Bright wrote:retard wrote:I only meant that the widespead adoption of Java shows how the public at large cares very little about the performance issues you mentioned. Java is one of the most widely used languages and it's also successful in many fields. Things could be better from programming language theory's point of view, but the business world is more interesting in profits and the large pool of Java coders has given better benefits than more expressive languages. I don't think that says anything against my notes about algebraic data types.Wed, 27 Oct 2010 12:08:19 -0700, Walter Bright wrote:You told me that widespread use of Java proved that nothing more complex than what Java provides is useful: "Java is mostly used for general purpose programming so your claims about usefulness and the need for extreme performance look silly." I'd be surprised if you seriously meant that, as it implies that Java is the pinnacle of computer language design, but I can't resist teasing you about it. :-)retard wrote:I don't understand your logic -- Widespread use of Java proves that algebraic data types aren't a better suited way for expressing compiler's data structures such as syntax trees?This is why the basic data structure in functional languages, algebraic data types, suits better for this purpose.I think you recently demonstrated otherwise, as proven by the widespread use of Java :-)
Oct 27 2010
retard wrote:I only meant that the widespead adoption of Java shows how the public at large cares very little about the performance issues you mentioned Java is one of the most widely used languages and it's also successful in many fields. Things could be better from programming language theory's point of view, but the business world is more interesting in profits and the large pool of Java coders has given better benefits than more expressive languages. I don't think that says anything against my notes about algebraic data types.Choice of a language has numerous factors, so you cannot dismiss one factor because the other factors still make it an attractive choice. For example: "the widespread adoption of horses shows how the public at large cares very little about the cars you mentioned."
Oct 27 2010
Wed, 27 Oct 2010 14:15:04 -0700, Walter Bright wrote:retard wrote:I know that.I only meant that the widespead adoption of Java shows how the public at large cares very little about the performance issues you mentioned Java is one of the most widely used languages and it's also successful in many fields. Things could be better from programming language theory's point of view, but the business world is more interesting in profits and the large pool of Java coders has given better benefits than more expressive languages. I don't think that says anything against my notes about algebraic data types.Choice of a language has numerous factors, so you cannot dismiss one factor because the other factors still make it an attractive choice.I don't think I said anything that contradicts that.For example: "the widespread adoption of horses shows how the public at large cares very little about the cars you mentioned."I meant caring in a way that results in masses of programmers migrating their code from Java to a language with those performance issues solved (e.g. D). A layman can make general remarks from people switching from Java/C++/C to "new" languages such as Groovy, Javascript, Python, PHP, and Ruby. The people want "simpler" languages. For example Ruby has terrible performance, but the performance becomes a non-issue once the web service framework is built in a scalable way.
Oct 27 2010
"retard" <re tard.com.invalid> wrote in message news:iaa44v$17sf$2 digitalmars.com...I only meant that the widespead adoption of Java shows how the public at large cares very little about the performance issues you mentioned.The public at large is convinced that "Java is fast now, really!". So I'm not certain widespread adoption of Java necessarily indicates they don't care so much about performance. Of course, Java is quickly becoming a legacy language anyway (the next COBOL, IMO), so that throws another wrench into the works.
Oct 27 2010
Legacy in the sense that C is perhaps. http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html -----Original Message----- From: digitalmars-d-bounces puremagic.com [mailto:digitalmars-d-bounces puremagic.com] On Behalf Of Nick Sabalausky Sent: Wednesday, October 27, 2010 3:43 PM To: digitalmars-d puremagic.com Subject: Re: Looking for champion - std.lang.d.lex "retard" <re tard.com.invalid> wrote in message news:iaa44v$17sf$2 digitalmars.com...I only meant that the widespead adoption of Java shows how the public at large cares very little about the performance issues you mentioned.The public at large is convinced that "Java is fast now, really!". So I'm not certain widespread adoption of Java necessarily indicates they don't care so much about performance. Of course, Java is quickly becoming a legacy language anyway (the next COBOL, IMO), so that throws another wrench into the works.
Oct 27 2010
Wed, 27 Oct 2010 16:04:34 -0600, Todd D. VanderVeen wrote:Legacy in the sense that C is perhaps. http://www.tiobe.com/index.php/content/paperinfo/tpci/index.htmlProbably the top 10 names are more or less correct there, but some funny notes: 33. D 36. Scratch 40. Haskell 42. JavaFX Script 49. Scala Scratch is an educational tool. It isn't really suitable for any real world applications. It slows down considerably with too many expressions. There are several books about Haskell and Scala. Both have several books on them, active mailing lists, and also very many active community projects. Haven't heard much about JavaFX outside Sun/Oracle. These statistics look really weird.
Oct 27 2010
retard wrote:Wed, 27 Oct 2010 16:04:34 -0600, Todd D. VanderVeen wrote:I reckon Fortran is the one to look at it. If Tiobe's stats were sensible, the Fortran numbers would be solid as a rock. And ADA ought to be pretty stable too. But look at this: http://www.tiobe.com/index.php/paperinfo/tpci/Ada.html Laughable.Legacy in the sense that C is perhaps. http://www.tiobe.com/index.php/content/paperinfo/tpci/index.htmlProbably the top 10 names are more or less correct there, but some funny notes: 33. D 36. Scratch 40. Haskell 42. JavaFX Script 49. Scala Scratch is an educational tool. It isn't really suitable for any real world applications. It slows down considerably with too many expressions. There are several books about Haskell and Scala. Both have several books on them, active mailing lists, and also very many active community projects. Haven't heard much about JavaFX outside Sun/Oracle. These statistics look really weird.
Oct 28 2010
Am 28.10.2010 16:46, schrieb Don:retard wrote:There was an article in the Ct-Magazin (German) where they took a closer look at this rankings. For example: - search for 'C' you got 3080 M - search for 'Java' you got 167 M 'Java' only competes with the island Java 'C' competes with C&A, c't-Magazin, C-Quadrat, C+C, char 'c', ... and many many more .... So, to correct this, only the first *100* (hundred) results are reviewed and the resulting factor applied to the sum of results. Just look at 1-100, then at 101-200, 201-300 and so on .. You get complete different factors. So this numbers at tiobe are really lying!!!!!! source: http://www.heise.de/developer/artikel/Traue-keiner-Statistik-993137.html greets MatthiasWed, 27 Oct 2010 16:04:34 -0600, Todd D. VanderVeen wrote:I reckon Fortran is the one to look at it. If Tiobe's stats were sensible, the Fortran numbers would be solid as a rock. And ADA ought to be pretty stable too. But look at this: http://www.tiobe.com/index.php/paperinfo/tpci/Ada.html Laughable.Legacy in the sense that C is perhaps. http://www.tiobe.com/index.php/content/paperinfo/tpci/index.htmlProbably the top 10 names are more or less correct there, but some funny notes: 33. D 36. Scratch 40. Haskell 42. JavaFX Script 49. Scala Scratch is an educational tool. It isn't really suitable for any real world applications. It slows down considerably with too many expressions. There are several books about Haskell and Scala. Both have several books on them, active mailing lists, and also very many active community projects. Haven't heard much about JavaFX outside Sun/Oracle. These statistics look really weird.
Oct 28 2010
On 27/10/2010 22:43, Nick Sabalausky wrote:"retard"<re tard.com.invalid> wrote in message news:iaa44v$17sf$2 digitalmars.com...Java is quickly becoming a legacy language? the next COBOL? SRSLY?... Just two years ago, the now hugely popular Android platform choose Java as it's language of choice, and you think Java is becoming legacy?... The development of the Java language itself has stagnated over the last 6 years or so (especially due to corporate politics, which now has become even worse and uncertain with all the shit Oracle is doing), but that's a completely different statement from saying Java is becoming legacy. In fact, all the uproar and concern about the future of Java under Oracle, of the JVM, of the JCP (the body that regulates changes to Java),etc., is a testament to the huge popularity of Java. Otherwise people (and corporations) wouldn't care, they would just let it wither away with much less concern. -- Bruno Medeiros - Software EngineerI only meant that the widespead adoption of Java shows how the public at large cares very little about the performance issues you mentioned.The public at large is convinced that "Java is fast now, really!". So I'm not certain widespread adoption of Java necessarily indicates they don't care so much about performance. Of course, Java is quickly becoming a legacy language anyway (the next COBOL, IMO), so that throws another wrench into the works.
Nov 19 2010
Bruno Medeiros:Java is quickly becoming a legacy language? the next COBOL? SRSLY?... Just two years ago, the now hugely popular Android platform choose Java as it's language of choice, and you think Java is becoming legacy?...Java on Adroid is not going well, there is a Oracle->Google lawsuit in progress. Google is interested in using a variant of NaCL on Android too. Bye, bearophile
Nov 19 2010
On Fri, Nov 19, 2010 at 4:20 PM, bearophile <bearophileHUGS lycos.com>wrote:Bruno Medeiros:I have to agree with Bruno here, Java isn't going anywhere soon. It has an active community, corporations that very actively support it, and an open source effort that's probably the largest of any language (check out the Apache Foundation project lists). Toss in Clojure, Scala, Groovy, and their friends that can build on top of Java libraries, and you wind up with a package that isn't becoming obsolete any time soon.Java is quickly becoming a legacy language? the next COBOL? SRSLY?... Just two years ago, the now hugely popular Android platform choose Java as it's language of choice, and you think Java is becoming legacy?...Java on Adroid is not going well, there is a Oracle->Google lawsuit in progress. Google is interested in using a variant of NaCL on Android too.
Nov 19 2010
"Andrew Wiley" <debio264 gmail.com> wrote in message news:mailman.501.1290205603.21107.digitalmars-d puremagic.com...On Fri, Nov 19, 2010 at 4:20 PM, bearophile <bearophileHUGS lycos.com>wrote:To be clear, I meant Java the language, not Java the VM. But yea, you're right, I probably overstated my point. What I have noticed though is, like Bruno said, a slowdown in Java language development and I can certainly imagine complications from the Oracle takeover of Sun. I've also been noticing decreasing interest in using Java (the language) for new projects (although, yes, not *completely* stalled interest) compared to 5-10 years ago, sharply increased awareness and recognition of Java's drawbacks compared to 5-10 years ago, and greatly increased interest and usage of other JVM languages besides Java. Ten years from now, I wouldn't at all be surprised to see Java (the language) being used primarily for maintenance on existing software that had already been written in Java. In fact, I'd be surprised if that doesn't end up being the case at that point. But I do imagine seeing things like D, Nemerle, Scala and Python thriving at that point.Bruno Medeiros:I have to agree with Bruno here, Java isn't going anywhere soon. It has an active community, corporations that very actively support it, and an open source effort that's probably the largest of any language (check out the Apache Foundation project lists). Toss in Clojure, Scala, Groovy, and their friends that can build on top of Java libraries, and you wind up with a package that isn't becoming obsolete any time soon.Java is quickly becoming a legacy language? the next COBOL? SRSLY?... Just two years ago, the now hugely popular Android platform choose Java as it's language of choice, and you think Java is becoming legacy?...Java on Adroid is not going well, there is a Oracle->Google lawsuit in progress. Google is interested in using a variant of NaCL on Android too.
Nov 23 2010
As for D lexers and tokenizers, what would be nice is to A) build an antlr grammar for D B) build D targets for antlr so that antlr can generate lexers and parsers in the D language. For B) I found http://www.mbutscher.de/antlrd/index.html For A) A good list of antlr grammars is at http://www.antlr.org/grammar/list, but there isn't a D grammar. These things wouldn't be an enormous amount of work to create and maintain, and, if done, anyone could parse D code in many languages, including Java and C which would make providing IDE features for D development easier in those languages (eclipse for instance), and you could build lexers and parsers in D using antlr grammars. -Mike On Fri, Nov 19, 2010 at 5:09 PM, Bruno Medeiros <brunodomedeiros+spam com.gmail> wrote:On 27/10/2010 22:43, Nick Sabalausky wrote:"retard"<re tard.com.invalid> wrote in message news:iaa44v$17sf$2 digitalmars.com...Java is quickly becoming a legacy language? the next COBOL? SRSLY?... Just two years ago, the now hugely popular Android platform choose Java as it's language of choice, and you think Java is becoming legacy?... The development of the Java language itself has stagnated over the last 6 years or so (especially due to corporate politics, which now has become even worse and uncertain with all the shit Oracle is doing), but that's a completely different statement from saying Java is becoming legacy. In fact, all the uproar and concern about the future of Java under Oracle, of the JVM, of the JCP (the body that regulates changes to Java),etc., is a testament to the huge popularity of Java. Otherwise people (and corporations) wouldn't care, they would just let it wither away with much less concern. -- Bruno Medeiros - Software EngineerI only meant that the widespead adoption of Java shows how the public at large cares very little about the performance issues you mentioned.The public at large is convinced that "Java is fast now, really!". So I'm not certain widespread adoption of Java necessarily indicates they don't care so much about performance. Of course, Java is quickly becoming a legacy language anyway (the next COBOL, IMO), so that throws another wrench into the works.
Nov 19 2010
On 19/11/2010 22:25, Michael Stover wrote:As for D lexers and tokenizers, what would be nice is to A) build an antlr grammar for D B) build D targets for antlr so that antlr can generate lexers and parsers in the D language. For B) I found http://www.mbutscher.de/antlrd/index.html For A) A good list of antlr grammars is at http://www.antlr.org/grammar/list, but there isn't a D grammar. These things wouldn't be an enormous amount of work to create and maintain, and, if done, anyone could parse D code in many languages, including Java and C which would make providing IDE features for D development easier in those languages (eclipse for instance), and you could build lexers and parsers in D using antlr grammars. -MikeYes, that would be much better. It would be directly and immediately useful for the DDT project: "But better yet would be to start coding our own custom parser (using a parser generator like ANTLR for example), that could really be tailored for IDE needs. In the medium/long term, that's probably what needs to be done. " in http://www.digitalmars.com/d/archives/digitalmars/D/ide/Future_of_Descent_and_D_Eclipse_IDE_635.html -- Bruno Medeiros - Software Engineer
Nov 19 2010
so that was 4 months ago - how do things currently stand on that initiative? -Mike On Fri, Nov 19, 2010 at 6:37 PM, Bruno Medeiros <brunodomedeiros+spam com.gmail> wrote:On 19/11/2010 22:25, Michael Stover wrote:As for D lexers and tokenizers, what would be nice is to A) build an antlr grammar for D B) build D targets for antlr so that antlr can generate lexers and parsers in the D language. For B) I found http://www.mbutscher.de/antlrd/index.html For A) A good list of antlr grammars is at http://www.antlr.org/grammar/list, but there isn't a D grammar. These things wouldn't be an enormous amount of work to create and maintain, and, if done, anyone could parse D code in many languages, including Java and C which would make providing IDE features for D development easier in those languages (eclipse for instance), and you could build lexers and parsers in D using antlr grammars. -MikeYes, that would be much better. It would be directly and immediately useful for the DDT project: "But better yet would be to start coding our own custom parser (using a parser generator like ANTLR for example), that could really be tailored for IDE needs. In the medium/long term, that's probably what needs to be done. " in http://www.digitalmars.com/d/archives/digitalmars/D/ide/Future_of_Descent_and_D_Eclipse_IDE_635.html -- Bruno Medeiros - Software Engineer
Nov 19 2010
Am 20.11.2010 00:56, schrieb Michael Stover:so that was 4 months ago - how do things currently stand on that initiative? -Mike On Fri, Nov 19, 2010 at 6:37 PM, Bruno Medeiros <brunodomedeiros+spam com.gmail> wrote: On 19/11/2010 22:25, Michael Stover wrote: As for D lexers and tokenizers, what would be nice is to A) build an antlr grammar for D B) build D targets for antlr so that antlr can generate lexers and parsers in the D language. For B) I found http://www.mbutscher.de/antlrd/index.html For A) A good list of antlr grammars is at http://www.antlr.org/grammar/list, but there isn't a D grammar. These things wouldn't be an enormous amount of work to create and maintain, and, if done, anyone could parse D code in many languages, including Java and C which would make providing IDE features for D development easier in those languages (eclipse for instance), and you could build lexers and parsers in D using antlr grammars. -Mike Yes, that would be much better. It would be directly and immediately useful for the DDT project: "But better yet would be to start coding our own custom parser (using a parser generator like ANTLR for example), that could really be tailored for IDE needs. In the medium/long term, that's probably what needs to be done. " in http://www.digitalmars.com/d/archives/digitalmars/D/ide/Future_of_Descent_and_D_Eclipse_IDE_635.html -- Bruno Medeiros - Software EngineerThere is a project with an antlr D-grammar in work. http://code.google.com/p/vs-d-integration/ Maybe this can be finished? matthias
Nov 20 2010
On 19/11/2010 23:56, Michael Stover wrote:so that was 4 months ago - how do things currently stand on that initiative? -Mike On Fri, Nov 19, 2010 at 6:37 PM, Bruno Medeiros <brunodomedeiros+spam com.gmail> wrote: On 19/11/2010 22:25, Michael Stover wrote: As for D lexers and tokenizers, what would be nice is to A) build an antlr grammar for D B) build D targets for antlr so that antlr can generate lexers and parsers in the D language. For B) I found http://www.mbutscher.de/antlrd/index.html For A) A good list of antlr grammars is at http://www.antlr.org/grammar/list, but there isn't a D grammar. These things wouldn't be an enormous amount of work to create and maintain, and, if done, anyone could parse D code in many languages, including Java and C which would make providing IDE features for D development easier in those languages (eclipse for instance), and you could build lexers and parsers in D using antlr grammars. -Mike Yes, that would be much better. It would be directly and immediately useful for the DDT project: "But better yet would be to start coding our own custom parser (using a parser generator like ANTLR for example), that could really be tailored for IDE needs. In the medium/long term, that's probably what needs to be done. " in http://www.digitalmars.com/d/archives/digitalmars/D/ide/Future_of_Descent_and_D_Eclipse_IDE_635.html -- Bruno Medeiros - Software EngineerI don't know about Ellery, as you can see in that thread he/she(?) mentioned interest in working on that, but I don't know anything more. As for me, I didn't work on that, nor did I plan to. Nor am I planning to anytime soon, DDT can handle things with the current parser for now (bugs can be fixed on the current code, perhaps some limitations can be resolved by merging some more code from DMD), so I'll likely work on other more important features before I go there. For example, I'll likely work on debugger integration, and code completion improvements before I would go on writing a new parser from scratch. Plus, it gives more time to hopefully someone else work on it. :P Unlike Walter, I can't write a D parser in a weekend... :) Not even on a week, especially since I never done anything of this kind before. -- Bruno Medeiros - Software Engineer
Nov 24 2010
On 11/24/2010 09:13 AM, Bruno Medeiros wrote:I don't know about Ellery, as you can see in that thread he/she(?) mentioned interest in working on that, but I don't know anything more.Normally I go by 'it'. Been pretty busy this semester, so I haven't been doing much. But the bottom line is, yes I have working antlr grammars for D1 and D2 if you don't mind 1) they're slow 2) they're tied to a hacked-out version of the netbeans fork of ANTLR2 3) they're tied to some custom java code 4) I haven't been keeping the tree grammars so up to date I've not released them for those reasons. Semester will be over in about 3 weeks, though, and I'll have time then.As for me, I didn't work on that, nor did I plan to. Nor am I planning to anytime soon, DDT can handle things with the current parser for now (bugs can be fixed on the current code, perhaps some limitations can be resolved by merging some more code from DMD), so I'll likely work on other more important features before I go there. For example, I'll likely work on debugger integration, and code completion improvements before I would go on writing a new parser from scratch. Plus, it gives more time to hopefully someone else work on it. :P Unlike Walter, I can't write a D parser in a weekend... :) Not even on a week, especially since I never done anything of this kind before.It took me like 3 months to read his parser to figure out what was going on.
Nov 24 2010
On 24/11/2010 16:19, Ellery Newcomer wrote:On 11/24/2010 09:13 AM, Bruno Medeiros wrote:I didn't meant to offend or anything, I was just unsure of that. To me Ellery seems like a female name (but that can be a bias due to English not being my first language, or some other cultural thing). On the other hand, I would be surprised if a person of the female variety would be that interested in D, to the point of contributing in such way.I don't know about Ellery, as you can see in that thread he/she(?) mentioned interest in working on that, but I don't know anything more.Normally I go by 'it'.Been pretty busy this semester, so I haven't been doing much. But the bottom line is, yes I have working antlr grammars for D1 and D2 if you don't mind 1) they're slow 2) they're tied to a hacked-out version of the netbeans fork of ANTLR2 3) they're tied to some custom java code 4) I haven't been keeping the tree grammars so up to date I've not released them for those reasons. Semester will be over in about 3 weeks, though, and I'll have time then.Hum, doesn't sound like it might be suitable for DDT, but I wasn't counting on that either.Not 3 man-months for sure!, right? (Man-month in the sense of someone working 40 hours per week during a month.) -- Bruno Medeiros - Software EngineerAs for me, I didn't work on that, nor did I plan to. Nor am I planning to anytime soon, DDT can handle things with the current parser for now (bugs can be fixed on the current code, perhaps some limitations can be resolved by merging some more code from DMD), so I'll likely work on other more important features before I go there. For example, I'll likely work on debugger integration, and code completion improvements before I would go on writing a new parser from scratch. Plus, it gives more time to hopefully someone else work on it. :P Unlike Walter, I can't write a D parser in a weekend... :) Not even on a week, especially since I never done anything of this kind before.It took me like 3 months to read his parser to figure out what was going on.
Nov 24 2010
On 11/24/2010 02:09 PM, Bruno Medeiros wrote:I didn't meant to offend or anything, I was just unsure of that.None taken; I'm just laughing at you. As I understand it, though, 'Ellery' is a unisex name, so it is entirely ambiguous.Probably notIt took me like 3 months to read his parser to figure out what was going on.Not 3 man-months for sure!, right? (Man-month in the sense of someone working 40 hours per week during a month.)
Nov 24 2010
Bruno Medeiros:On the other hand, I would be surprised if a person of the female variety would be that interested in D, to the point of contributing in such way.In Python newsgroups I have seen few women, now and then, but in the D newsgroup so far... not many. So far D seems a male thing. I don't know why. At the university at the Computer Science course there are a good enough number of female students (and few female teachers too). Bye, bearophile
Nov 24 2010
bearophile schrieb:Bruno Medeiros:At my university there are *very* few woman studying computer science. Most women sitting in CS lectures here are studying maths and have to do some basic CS lectures (I don't think they're the kind that would try D voluntarily). We have two female professors though.On the other hand, I would be surprised if a person of the female variety would be that interested in D, to the point of contributing in such way.In Python newsgroups I have seen few women, now and then, but in the D newsgroup so far... not many. So far D seems a male thing. I don't know why. At the university at the Computer Science course there are a good enough number of female students (and few female teachers too). Bye, bearophile
Nov 24 2010
"Daniel Gibson" <metalcaedes gmail.com> wrote in message news:icjv6l$p1r$2 digitalmars.com...bearophile schrieb:fest.Bruno Medeiros:At my university there are *very* few woman studying computer science. Most women sitting in CS lectures here are studying maths and have to do some basic CS lectures (I don't think they're the kind that would try D voluntarily). We have two female professors though.On the other hand, I would be surprised if a person of the female variety would be that interested in D, to the point of contributing in such way.In Python newsgroups I have seen few women, now and then, but in the D newsgroup so far... not many. So far D seems a male thing. I don't know why. At the university at the Computer Science course there are a good enough number of female students (and few female teachers too). Bye, bearophile
Nov 25 2010
On 24/11/2010 21:12, Daniel Gibson wrote:bearophile schrieb:It is well know that there is a big gender gap in CS with regards to students and professionals. Something like 5-20% I guess, depending on university, company, etc.. But the interesting thing (although also quite unfortunate), is that this gap takes a even greater dip downwards when you consider the communities of FOSS developers/contributors. It must be well below 1%! (note that I'm not talking about *users* of FOSS software, but only people who actually contribute code, whether for FOSS projects, or for their own indie/toy projects) -- Bruno Medeiros - Software EngineerBruno Medeiros:At my university there are *very* few woman studying computer science. Most women sitting in CS lectures here are studying maths and have to do some basic CS lectures (I don't think they're the kind that would try D voluntarily). We have two female professors though.On the other hand, I would be surprised if a person of the female variety would be that interested in D, to the point of contributing in such way.In Python newsgroups I have seen few women, now and then, but in the D newsgroup so far... not many. So far D seems a male thing. I don't know why. At the university at the Computer Science course there are a good enough number of female students (and few female teachers too). Bye, bearophile
Nov 26 2010
On 27/10/2010 22:04, retard wrote:Wed, 27 Oct 2010 13:52:29 -0700, Walter Bright wrote:"the widespead adoption of Java shows how the public at large cares very little about the performance issues you mentioned" WTF? The widespead adoption of Java means that _Java developers_ at large don't care about those performance issues (mostly because they work on stuff where they don't need to). But it's no statement about all the pool of developers. Java is hugely popular, but not in a "it's practically the only language people use" way. It's not like Windows on the desktop. -- Bruno Medeiros - Software Engineerretard wrote:I only meant that the widespead adoption of Java shows how the public at large cares very little about the performance issues you mentioned. Java is one of the most widely used languages and it's also successful in many fields. Things could be better from programming language theory's point of view, but the business world is more interesting in profits and the large pool of Java coders has given better benefits than more expressive languages. I don't think that says anything against my notes about algebraic data types.Wed, 27 Oct 2010 12:08:19 -0700, Walter Bright wrote:You told me that widespread use of Java proved that nothing more complex than what Java provides is useful: "Java is mostly used for general purpose programming so your claims about usefulness and the need for extreme performance look silly." I'd be surprised if you seriously meant that, as it implies that Java is the pinnacle of computer language design, but I can't resist teasing you about it. :-)retard wrote:I don't understand your logic -- Widespread use of Java proves that algebraic data types aren't a better suited way for expressing compiler's data structures such as syntax trees?This is why the basic data structure in functional languages, algebraic data types, suits better for this purpose.I think you recently demonstrated otherwise, as proven by the widespread use of Java :-)
Nov 19 2010
Walter Bright 写到:As we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?intense support! Someone to do it?
Feb 26 2011
On Saturday 26 February 2011 02:06:18 dolive wrote:Walter Bright =D0=B4=B5=BD:tsAs we all know, tool support is important for D's success. Making tools easier to build will help with that. =20 To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. =20 It should: =20 1. support a range interface for its input, and a range interface for i=edoutput 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintain=I'm working on it, but I have enough else going on right now that I'm no be= ing=20 very quick about it. I don't know when it will be done. =2D Jonathan M Davisin tandem =20 It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. =20 Anyone want to own this?=20 intense support! Someone to do it?
Feb 26 2011
Jonathan M Davis 写到:On Saturday 26 February 2011 02:06:18 dolive wrote:thanks, make an all out effort !Walter Bright 写到:I'm working on it, but I have enough else going on right now that I'm no being very quick about it. I don't know when it will be done. - Jonathan M DavisAs we all know, tool support is important for D's success. Making tools easier to build will help with that. To that end, I think we need a lexer for the standard library - std.lang.d.lex. It would be helpful in writing color syntax highlighting filters, pretty printers, repl, doc generators, static analyzers, and even D compilers. It should: 1. support a range interface for its input, and a range interface for its output 2. optionally not generate lexical errors, but just try to recover and continue 3. optionally return comments and ddoc comments as tokens 4. the tokens should be a value type, not a reference type 5. generally follow along with the C++ one so that they can be maintained in tandem It can also serve as the basis for creating a javascript implementation that can be embedded into web pages for syntax highlighting, and eventually an std.lang.d.parse. Anyone want to own this?intense support! Someone to do it?
Feb 26 2011