digitalmars.D - Best practices for parsing files
- lurker (9/9) Jan 25 2007 Hi.
- BCS (7/23) Jan 25 2007 Enki would be my choice if you don't mind using a code generator
- lurker (11/11) Jan 25 2007 Both suggestions are very interesting and I'll be evaluating them; but w...
- Sean Kelly (6/17) Jan 25 2007 The DMD lexer works pretty much this way, and it's available in every
Hi. I'm new to D but not to programming. I would like to write a small scripting engine using the great D programming language but I'm undecided on what techniques should use to parse source files. Since slices seem to be a central feature of D I was thinking on reading the whole file in memory and use slices to build the syntax tree. Does anyone have examples of parsing files using this method? Any other methods I should consider? Thanks.
Jan 25 2007
Reply to lurker,Hi. I'm new to D but not to programming. I would like to write a small scripting engine using the great D programming language but I'm undecided on what techniques should use to parse source files. Since slices seem to be a central feature of D I was thinking on reading the whole file in memory and use slices to build the syntax tree. Does anyone have examples of parsing files using this method? Any other methods I should consider? Thanks.Enki would be my choice if you don't mind using a code generator http://www.dsource.org/projects/ddl/wiki/Enki If you are feeling adventurous you can try dparse http://www.dsource.org/projects/scrapple/browser/trunk/dparser/dparse.d It's not vary mature but it's kind of fun to play with. (full disclosure: I wrote dparse)
Jan 25 2007
Both suggestions are very interesting and I'll be evaluating them; but what I was hoping was something more on the line of DMD's parser (been insanely fast): A hand-written parser. We also thought of translating it to D just as an exercise to learn how it works. You see, one of my concerns (and the primary reason to use D) is parsing speed: I'm going to parse lot's and lot's of those files and memory consumption almost isn't an issue since we have lots of it. Also, the tasks will be executed on a thread pool and we don't want to face locking problems with code generated by some tool. At least if we write the code we'll know who to blame. :D Thanks.
Jan 25 2007
lurker wrote:Both suggestions are very interesting and I'll be evaluating them; but what I was hoping was something more on the line of DMD's parser (been insanely fast): A hand-written parser. We also thought of translating it to D just as an exercise to learn how it works.Somebody did that already, it's not been updated for a couple of months though: http://www.dsource.org/projects/dparser
Jan 25 2007
Lutger wrote:Somebody did that already, it's not been updated for a couple of months though: http://www.dsource.org/projects/dparserDidn't know that. Taking a look right now. Thanks.
Jan 25 2007
Reply to lurker,Both suggestions are very interesting and I'll be evaluating them; but what I was hoping was something more on the line of DMD's parser (been insanely fast): A hand-written parser. We also thought of translating it to D just as an exercise to learn how it works. You see, one of my concerns (and the primary reason to use D) is parsing speed: I'm going to parse lot's and lot's of those files and memory consumption almost isn't an issue since we have lots of it.Ah, then I guess you won't want an LL parser.Also, the tasks will be executed on a thread pool and we don't want to face locking problems with code generated by some tool. At least if we write the code we'll know who to blame. :DBoth should be thread safe (if you stick to one thread per file) As far as slicing goes, I'm working on a parser that read a file into memory (I guess it could mmap it in as well) and converts it to an array of token structs. A parser will then walk on the array. If you new a big array of struct in advance and have your lexer write directly to the array (slicing out of the file where the text is important, that should be fairly fast. That's my 2 cents, I'm not sure how much help this will be (my parser is /not/ performance driven) but I hope it might help.
Jan 25 2007
BCS wrote:As far as slicing goes, I'm working on a parser that read a file into memory (I guess it could mmap it in as well) and converts it to an array of token structs. A parser will then walk on the array. If you new a big array of struct in advance and have your lexer write directly to the array (slicing out of the file where the text is important, that should be fairly fast.Excellent! Is any of your code available? We really would like take a look at your code (If possible). We are a little lost right now and by your description It seams very much like what we want to build. Thanks
Jan 25 2007
Reply to lurker,BCS wrote:That isn't how my lexer works (at the moment), I was just saying I think it could be done. In fact, my app copies everything to make sure that it doesn't stomp on it's self. OTOH, it wouldn't be to hard to port it to what I described above, and I plan on posting the code when I get a bit closer to done.As far as slicing goes, I'm working on a parser that read a file into memory (I guess it could mmap it in as well) and converts it to an array of token structs. A parser will then walk on the array. If you new a big array of struct in advance and have your lexer write directly to the array (slicing out of the file where the text is important, that should be fairly fast.Excellent! Is any of your code available? We really would like take a look at your code (If possible). We are a little lost right now and by your description It seams very much like what we want to build. Thanks
Jan 26 2007
lurker wrote:Hi. I'm new to D but not to programming. I would like to write a small scripting engine using the great D programming language but I'm undecided on what techniques should use to parse source files. Since slices seem to be a central feature of D I was thinking on reading the whole file in memory and use slices to build the syntax tree. Does anyone have examples of parsing files using this method?The DMD lexer works pretty much this way, and it's available in every DMD distribution :-)Any other methods I should consider?This is the method I've used in the past, even in C++. It seems to make for cleaner code than the allocate/copy method, and it's faster to boot. Sean
Jan 25 2007