www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Walnut

reply Dan <murpsoft hotmail.com> writes:
Hi guys,

I've been working on the Walnut project for a while now by myself, and it's got
one major component remaining before the alpha debugging can start going down -
the parser.

http://dsource.org/projects/walnut/browser/branches/1.9/

The code seems pretty elegant so far, and it should be easy enough to read.  I
recommend starting in Value.d, but I'm sure there are lots of things I've done
wrong while programming it.  I was hoping I could convince some people to take
a look at it and point out my mistakes.

I'm going to be studying how to use Trac today, so if you'd be so kind, please
use the forums for now.

Much appreciated,
Regards,
Dan
Dec 31 2007
parent reply Alan Knowles <alan akbkhome.com> writes:
Dan wrote:
 Hi guys,
 
 I've been working on the Walnut project for a while now by myself, and it's
got one major component remaining before the alpha debugging can start going
down - the parser.
 
 http://dsource.org/projects/walnut/browser/branches/1.9/
 
 The code seems pretty elegant so far, and it should be easy enough to read.  I
recommend starting in Value.d, but I'm sure there are lots of things I've done
wrong while programming it.  I was hoping I could convince some people to take
a look at it and point out my mistakes.
 
As you've opened the door - please regard the below as my personal opinon, and take as such... ;) Value.d: You have made quite heavy use of op*** magic methods, Having done this before a few times, it always bites me in the ass later on.. - as you have to remember what magic is going to occur when you assign/create. It may be better to switch to more obvious/classic methods, - overloaded constructors or an overloaded static method "construct()", / to[typename], index(int id) etc. = This is going to make the future code alot easier to read, and understand. (along with maintain, enable others to quickly work out what is going on ) structure.d: I would be tempted to create a method to generate this type of code, rather than trying to create a solution(the op*** stuff) for a problem that would not have existed if you had gone down the road of using code generators.. Have a look at this to get an idea.. http://www.akbkhome.com/svn/gtkDS/wrap/APILookupPhobos.txt text.d: some of this could be autogenerated, and enum'd (might be clearer) eg... TEXT.undefined methods.d: again, using a code generator would give you benefit's of documentation and using static D classes to encapsulate each Javascript class (along with making smaller more manageable files..) Not sure why your standard method call is using varargs (...) - unless I misread the code.. ------------ interpreter.d (might be better to rename it tokenizer.d) Looks like the next big jobs would be: finish tokenizing (and do some test cases) creating the OPcodes.... (this part looks painful and time-consuming, having seen dmdscript version. - parser / expression / statement etc.) Scope Management? Opcode runtime (~2800 lines of code in dmdscript) --- Unfortunately I detest forum's - old school (or stuborn), i prefer good ole mailing lists (which I have for most D newsgroups, as I pull the nntp feed into my mailbox) I'm keeping an eye on Walnut, but I since most of my needs for Javascript/DMDscript mean getting results very quickly from hacks to DMDscript, I cant really justify to much real help to Walnut unfortunately - but do keep working on it, as soon as the opcode runtime, parser/opcode builder are done, I'd be pretty keen to retarget all the binding code for DMDscript to be Walnut only. Regards Alan
 I'm going to be studying how to use Trac today, so if you'd be so kind, please
use the forums for now.
 
 Much appreciated,
 Regards,
 Dan
 
Jan 01 2008
parent reply Dan Lewis <murpsoft hotmail.com> writes:
Alan Knowles Wrote:
 As you've opened the door - please regard the below as my personal 
 opinon, and take as such... ;)
Of course. : ) If I may, please don't take this as a rejection of your help. I'm simply explaining how I've come to where I am. I'll probably implement at least some of these right away.
 
 Value.d:
 You have made quite heavy use of op*** magic methods, Having done this 
 before a few times, it always bites me in the ass later on.. - as you 
 have to remember what magic is going to occur when you assign/create.
Yeah, the opAssign/opCall is only being used so I can go: Value v = cast(Value) 4; instead of: Value v; v.i = 4; v.type = TYPE.NUMBER; The only magic that ever happens there should be the automatic type property assignment. Then there's the opCall(Value, Value, Value[] ...) which is to call Functions, and the opIndex, opIndexAssign, opIn_r which are to use Values as Objects. The promotion of the Value struct to hold Function, Array and Object is a blatant diregard of the ECMAScript spec, however, it *is* semantically consistent, and consistent with the language itself. It could be used to bring a significant structural advantage as we now have a single primitive to work with; and since the original form needed to disambiguate the type of a Value anyways.
 
 It may be better to switch to more obvious/classic methods, - overloaded 
 constructors or an overloaded static method "construct()", / 
 to[typename], index(int id) etc.
I am actually starting to think that the Value.to[typename] format is cumbersome, as in all honesty, I'm not sure whether the output of a Value.toString() which is a Number object containing a value of 4 should be "4", "[object Number]", "4.0" (it's a double), or what. I was then wondering how this should relate to the methods that we have; Object_prototype_toString, RegExp_prototype_source, etc. So, there'll be a semantic change there somewhere to disambiguate, as I'm sure we both agree that ambiguity is bad.
 = This is going to make the future code alot easier to read, and 
 understand. (along with maintain, enable others to quickly work out what 
 is going on )
 
 structure.d:
 I would be tempted to create a method to generate this type of code, 
It is very tedious to maintain that one. I'll probably try to do something like that soon. To expand, I had originally hoped to be able to use Associative Arrays, but they apparently contain a pointer to a complex hashing structure with even more pointers below. I had hoped to simply sort the char[] pointing structures based on the strings alphabetically and do a binary search; which is probably faster for the small sets typically used for ECMAScript objects. The structure.d file was an effort to create a static literal which wouldn't need any memcpy or anything of the sort; it would be loaded in via DMA straight from the file and be useable immediately.
 rather than trying to create a solution(the op*** stuff) for a problem 
 that would not have existed if you had gone down the road of using code 
 generators..
 Have a look at this to get an idea..
 http://www.akbkhome.com/svn/gtkDS/wrap/APILookupPhobos.txt
 
 text.d:
 some of this could be autogenerated, and enum'd (might be clearer)
 eg...  TEXT.undefined
Definitely like the enum notation better. : )
 
 methods.d:
 again, using a code generator would give you benefit's of documentation 
 and using static D classes to encapsulate each Javascript class (along 
 with making smaller more manageable files..)
When I converted Walnut 1.x from DMDScript, I was mostly doing it to understand more of what Walter had written to learn how a good implementation looks. I noticed that there was alot of redundancy in each of the files, and that my head was filling with all sorts of different constructs as I examined each file. That's why I converted it to aspect oriented. Now the code is so boringly simple that apart from value.d it reads like a list. The problem with encapsulating JavaScript classes with D classes is that spec requires you to be able to expand JavaScript objects, so you eventually have to use an array notation inside that; as per DMDScript and Spidermonkey. You end up duplicating several properties inside the array notation and class notation; and there's extensive code to look up the address of an ECMAscript property. This is why even DMDScript property lookup is a few times slower than Lua or Io.
 
 Not sure why your standard method call is using varargs (...) - unless I 
 misread the code..
The Value[] arguments was originally not a varargs, and you could pass it an array of Values just fine. My interpretation of varargs is that it converts a set of Values to a Value[] at the caller by prepending the length? So the varargs would simply mean you can now call the function passing: (self, cc, arg1, arg2, arg3), as well as: Value[] args = { arg1, arg2, arg3 }; (self, cc, args) and that the call would look identical.
 interpreter.d
 (might be better to rename it tokenizer.d)
I'm (now) hoping to run the parser algorithms from the same file, and making sure it inlines the lexer. I'm not sure if I want to generate tokens and then interpret them, or if I can use the finite state brought about by position in the lexer switch to somehow mean the same (preventing a double-switch). The problem with that is that I can't seem to think beyond one token very well - the same one faced by the guys who invented separation of lexer, parser, interpreter.
 
 Looks like the next big jobs would be:
 finish tokenizing (and do some test cases)
 creating the OPcodes.... (this part looks painful and time-consuming, 
 having seen dmdscript version. - parser / expression / statement etc.)
Yup. I'm facing some analysis paralysis on this one, trying to come up with something cool (the wheel's already been invented, so why not do it round this time?)
 
 Scope Management?
 Opcode runtime (~2800 lines of code in dmdscript)
Yeah, I was hoping to tie scope in with something during parsing of {}. I've already got a Global object which is already being looked at for non-keywords in my [rather pathetic so far] lexer. I think what I want is a bunch of Value's, which are of TYPE.OBJECT, or perhaps a new type just like it, which carry variables and stuff. If I compile all functions down to (unoptimized) native code with the same call interface as the natives (my dream) then I could probably just use the stack to handle scope as per the natural way instead of faking it like most interpreters.
 
 ---
 Unfortunately I detest forum's - old school (or stuborn), i prefer good 
 ole mailing lists (which I have for most D newsgroups, as I pull the 
 nntp feed into my mailbox)
 
 I'm keeping an eye on Walnut, but I since most of my needs for 
 Javascript/DMDscript mean getting results very quickly from hacks to 
 DMDscript, I cant really justify to much real help to Walnut 
 unfortunately - but do keep working on it, as soon as the opcode 
 runtime, parser/opcode builder are done, I'd be pretty keen to retarget 
 all the binding code for DMDscript to be Walnut only.
Actually, Walnut 1.0 is branched from DMDScript, but I reformatted it, cleaned it up and the likes. It almost has native ActiveX, moreso than JScript. But there are major bugs that I don't understand. Perhaps you'd be more prone to help there than Walnut 2.x. Well, that was a HUGE ramble. Regards, Dan
Jan 02 2008
parent reply Alan Knowles <alan akbkhome.com> writes:
snip snip..  - lots of bits inline.

Actually I forgot to mention - have you seen ECMAscript 4 (the new one) 
- While aiming for that as a target may be a bit adventurous, It's 
probably worth thinking about how some of it will be implemented eventually.

 Yeah, the opAssign/opCall is only being used so I can go:
 Value v = cast(Value) 4;

 instead of:
 Value v;
 v.i = 4;
 v.type = TYPE.NUMBER;
   
yeah I was hoping for something like: auto v = new Value(4); Which is roughly the same length, and is a little clearer.. - probably worth thinking about for new code.. (If you decide to go for it, I might get bored one day, and help you refactor the old code ;)
 The only magic that ever happens there should be the automatic type property
assignment.

 Then there's the opCall(Value, Value, Value[] ...) which is to call Functions,
and the opIndex, opIndexAssign, opIn_r which are to use Values as Objects.

 The promotion of the Value struct to hold Function, Array and Object is a
blatant diregard of the ECMAScript spec, however, it *is* semantically
consistent, and consistent with the language itself.  It could be used to bring
a significant structural advantage as we now have a single primitive to work
with; and since the original form needed to disambiguate the type of a Value
anyways.
   
I suspect this may get into a bit of trouble when you deal with some of the weird and wonderfull scoping stuff with Javascript. From what I remember: CallableFunction extends Object Value can hold an object... CallableFunction holds a reference to the FunctionDefinition (code etc.) FunctionDefinition holds a reference to the Creation scope...
   
 It may be better to switch to more obvious/classic methods, - overloaded 
 constructors or an overloaded static method "construct()", / 
 to[typename], index(int id) etc.
     
I am actually starting to think that the Value.to[typename] format is cumbersome, as in all honesty, I'm not sure whether the output of a Value.toString() which is a Number object containing a value of 4 should be "4", "[object Number]", "4.0" (it's a double), or what. I was then wondering how this should relate to the methods that we have; Object_prototype_toString, RegExp_prototype_source, etc. So, there'll be a semantic change there somewhere to disambiguate, as I'm sure we both agree that ambiguity is bad.
as D uses the cast keyword, it's actually marginally shorter, a = cast(String) theval a = theval.toString(); obviously, you could use as[typename], to make it distinct...
   
 = This is going to make the future code alot easier to read, and 
 understand. (along with maintain, enable others to quickly work out what 
 is going on )

 structure.d:
 I would be tempted to create a method to generate this type of code, 
     
It is very tedious to maintain that one. I'll probably try to do something like that soon. To expand, I had originally hoped to be able to use Associative Arrays, but they apparently contain a pointer to a complex hashing structure with even more pointers below. I had hoped to simply sort the char[] pointing structures based on the strings alphabetically and do a binary search; which is probably faster for the small sets typically used for ECMAScript objects. The structure.d file was an effort to create a static literal which wouldn't need any memcpy or anything of the sort; it would be loaded in via DMA straight from the file and be useable immediately.
When you start binding something like gtk, with craploads of enum's expressed as object properties, the whole lookup stuff gets even more complex. I suspect the answer is to have a method to add/get property etc. and use assoc. arrays to start with, then optimize the crap out of it later..
   

 When I converted Walnut 1.x from DMDScript, I was mostly doing it to
understand more of what Walter had written to learn how a good implementation
looks.

 I noticed that there was alot of redundancy in each of the files, and that my
head was filling with all sorts of different constructs as I examined each
file.  That's why I converted it to aspect oriented.  Now the code is so
boringly simple that apart from value.d it reads like a list.

 The problem with encapsulating JavaScript classes with D classes is that spec
requires you to be able to expand JavaScript objects, so you eventually have to
use an array notation inside that; as per DMDScript and Spidermonkey.  You end
up duplicating several properties inside the array notation and class notation;
and there's extensive code to look up the address of an ECMAscript property. 
This is why even DMDScript property lookup is a few times slower than Lua or Io.
   
Most of the classes are really just static classes - just used to tidy up the code, rather than actually doing real encapsulation. eg. static class Date { void getHours(....) { } void getTime(....) { } } Although I have to add prefixes in the code generator when doing bindings, as library writers seem to have a horible habit of using D keywords or common method names ;)
   
 Not sure why your standard method call is using varargs (...) - unless I 
 misread the code..
     
The Value[] arguments was originally not a varargs, and you could pass it an array of Values just fine. My interpretation of varargs is that it converts a set of Values to a Value[] at the caller by prepending the length? So the varargs would simply mean you can now call the function passing: (self, cc, arg1, arg2, arg3), as well as: Value[] args = { arg1, arg2, arg3 }; (self, cc, args) and that the call would look identical.
mmh,, kind of cute ;) - That's reminds me of those cool language features, that confuses other people when they first see it [string] / [number] => resulting in an array of strings [Pike]
   
 interpreter.d
 (might be better to rename it tokenizer.d)
     
I'm (now) hoping to run the parser algorithms from the same file, and making sure it inlines the lexer. I'm not sure if I want to generate tokens and then interpret them, or if I can use the finite state brought about by position in the lexer switch to somehow mean the same (preventing a double-switch). The problem with that is that I can't seem to think beyond one token very well - the same one faced by the guys who invented separation of lexer, parser, interpreter.
Actually having the tokenizer available is really usefull - http://www.akbkhome.com/blog.php/View/156/Script_Crusher.html I think you are being a bit hopeful on that. - I face the same problem with the steps, by the time I've finished the parser, my brain is usualy at exploding point and I give up ;) I was wondering, although you can not copy DMDscript directly, If someone (eg. me) wrote a summary of the steps that where involved in the parser/code gen stage.. - and posibly to opcodes, then you would not be breaking copyright??? based on code documentation, rather than actual code???? - It would save you a considerable amount of pain.....
   
 Scope Management?
 Opcode runtime (~2800 lines of code in dmdscript)
     
Yeah, I was hoping to tie scope in with something during parsing of {}. I've already got a Global object which is already being looked at for non-keywords in my [rather pathetic so far] lexer. I think what I want is a bunch of Value's, which are of TYPE.OBJECT, or perhaps a new type just like it, which carry variables and stuff.
I need to understand how Walter solved the closures bug in the last release - I copied the code into my repo, but did not have time to understand it. The problems you get with Javascript, is that the scope is not only from Global, but also creation scope (which may not be a compile time).. and outer layers (eg. functions within functions etc.)
 If I compile all functions down to (unoptimized) native code with the same
call interface as the natives (my dream) then I could probably just use the
stack to handle scope as per the natural way instead of faking it like most
interpreters.
   
I've looked at this a few times, I dont think you will ever get native code out of a scripted language very well. (let alone understanding gcc's internals to make it happen ;) - One thing to think about is how it may be possible to write your opcode arrays to memory, and how to duplicate your stack (so that you r interpreter can eventually handle multi-threaded applications), key to this is making the Value object serializable/unserializable..
 Actually, Walnut 1.0 is branched from DMDScript, but I reformatted it, cleaned
it up and the likes.  It almost has native ActiveX, moreso than JScript.  But
there are major bugs that I don't understand.  Perhaps you'd be more prone to
help there than Walnut 2.x.
   
Have you updated Walnut 1.0 with Walter's last change? - the closure fix? Yes, There are alot of other stuff I've added to DMDscript that could do with a better home ;) Regards Alan
 Well, that was a HUGE ramble.
 Regards,
 Dan
   
Jan 02 2008
parent reply Dan Lewis <murpsoft hotmail.com> writes:
Alan Knowles Wrote:

 snip snip..  - lots of bits inline.
 
 Actually I forgot to mention - have you seen ECMAscript 4 (the new one) 
Yeah, so far I've been targetting ECMAScript 3, but I've seen 4 and have it in mind. I figured I'd worry about the difference after it was running js files.
 Yeah, the opAssign/opCall is only being used so I can go:
 Value v = cast(Value) 4;

 instead of:
 Value v;
 v.i = 4;
 v.type = TYPE.NUMBER;
   
yeah I was hoping for something like: auto v = new Value(4);
I wish I could, but auto only accepts "simple" data types, and constructors can't even be faked in structs. One would need to store it in a class, which involves keeping unweildy, opague vtbls, forces us to use the heap and pass by reference (structs can go either way) What we can do now is go: Value v = 4; and have it correctly use opAssign and opCall. What it fails to do is handle things like: Value myFunc() { return 4; }
(If you decide to go for it, I might 
 get bored one day, and help you refactor the old code ;)
You're always welcome to try refactoring it any which way. If the resultant program is more elegant, it goes in my source.
 I suspect this may get into a bit of trouble when you deal with some of 
 the weird and wonderfull scoping stuff with Javascript.
  From what I remember:
 
 CallableFunction extends Object
 Value can hold an object...
 
 CallableFunction holds a reference to the FunctionDefinition (code etc.)
 FunctionDefinition holds a reference to the Creation scope...
Actually, that's pretty easy. Value stores both callable js functions and js objects, and the js functions hold pointers to native functions. Getting more challenging, the scope for the function takes two aspects, first we have a bunch of identifiers that we need to know, and second, we need those to be local to the function. My plan was to essentially push Values onto the local part of the call stack (below EBP)
 It may be better to switch to more obvious/classic methods, - overloaded 
 constructors or an overloaded static method "construct()", / 
 to[typename], index(int id) etc.
as D uses the cast keyword, it's actually marginally shorter, a = cast(String) theval
The problem is that cast(string) X can only be defined on either the string struct (which is native to D), or for *one* type via the opCast method. The reason for only being able to do it for one type, is because the function signatures in D need to have the same return type (I dunno, ask Walter?)
 a = theval.toString();
 obviously, you could use as[typename], to make it distinct...
Yeah, that could work.
 I would be tempted to create a method to generate this type of code, 
It is very tedious to maintain that one. I'll probably try to do something like that soon.
When you start binding something like gtk, with craploads of enum's expressed as object properties, the whole lookup stuff gets even more complex. I suspect the answer is to have a method to add/get property etc. and use assoc. arrays to start with, then optimize the crap out of it later..
Yeah, I would, except I wouldn't be able to optimize out AA's. They're part of D, not part of my code. What I believe I could do is to use AA's, and later use opIndex, opAssign, opIn_r; but essentially this doesn't provide much benefit over what I've got now - the produced functionality is same; and I *can* declare a static literal.
 Most of the classes are really just static classes - just used to tidy 
 up the code, rather than actually doing real encapsulation.
 eg.
 static class Date {
       void  getHours(....) { }
       void  getTime(....) { }
 }
Oh, I didn't know that got optimized out.
 Although I have to add prefixes in the code generator when doing 
 bindings, as library writers seem to have a horible habit of using D 
 keywords or common method names ;)
Yeah, I don't mind folks using common names, as long as they tidy up their namespaces. Walnut at this point is not tidy or threadsafe. My program is using static global variables for now, and none of the modules identify themselves as walnut.module.
 interpreter.d
 (might be better to rename it tokenizer.d)
Actually having the tokenizer available is really usefull - http://www.akbkhome.com/blog.php/View/156/Script_Crusher.html
To follow up, today I created the functions interpret() and tokenize(). The tokenizer is return Value's, which can now also be non-morphemic tokens (like '=' and 'a bunch of stuff in parens'). The tokenizer puts data into the Value for morphemic tokens (like numbers, strings, etc) This is actually a ways better than DMD because it only needs to be read once.
 I think you are being a bit hopeful on that. - I face the same problem 
 with the steps, by the time I've finished the parser, my brain is usualy 
 at exploding point and I give up ;)
Yeah, I realized that you can only efficiently have a single instruction address; which is why I couldn't move beyond the current token. Once could theoretically write a predictive parser, but those are evil.
 
 I was wondering, although you can not copy DMDscript directly, If 
 someone (eg. me) wrote a summary of the steps that where involved in the 
 parser/code gen stage.. - and posibly to opcodes, then you would not be 
 breaking copyright??? based on code documentation, rather than actual 
 code????
 - It would save you a considerable amount of pain.....
Yes, except the object isn't to copy DMDScript without the license, the objective is to create an engine that's significantly better. At the moment, I would say roughly half the code is written and I'm using 108KB vs DMDScript's 513KB. The parser is the only remaining component before it can (incorrectly) run javascript files. The rest is debugging.
 I need to understand how Walter solved the closures bug in the last 
 release - I copied the code into my repo, but did not have time to 
 understand it.
 The problems you get with Javascript, is that the scope is not only from 
 Global, but also creation scope (which may not be a compile time).. and 
 outer layers (eg. functions within functions etc.)
You have a scope chain, essentially, every function containing this one (including global) is checked in order from this context up to global. This can be done by examining the stack during runtime; which should only be storing Value structs, so it should be pretty readily understood.
 If I compile all functions down to (unoptimized) native code with the same
call interface as the natives (my dream) then I could probably just use the
stack to handle scope as per the natural way instead of faking it like most
interpreters.
   
I've looked at this a few times, I dont think you will ever get native code out of a scripted language very well. (let alone understanding gcc's internals to make it happen ;) - One thing to think about is how it may be possible to write your opcode arrays to memory, and how to
The plan is to write opcodes to memory strictly operating on whatever's in my Value structs. There isn't any ambiguity between types or sizes, and I can probably take advantage of D's inline asm statements to ease things a bit.
 duplicate your stack (so that  you r interpreter can eventually handle  
 multi-threaded applications), key to this is making the Value object 
 serializable/unserializable..
You mean like fork(). I'm thinking like; being able to run multiple instances of ecmascript by calling: global = Global_init(); interpret(source,global,args); as many times as I like within the same program from different threads (that someone else can go ahead and figure out how to make) Value is already serializable. It's a struct. One of the reasons I hate classes is because they're opague, and thus very hard to serialize.
 Have you updated Walnut 1.0 with Walter's last change? - the closure fix?
Nope. I should though, and I should make 1.x run on D 2.x
 Yes, There are alot of other stuff I've added to DMDscript that could do 
 with a better home ;)
Highly interested. Also, 1.x almost has native ActiveXObject. It needs a few bugs worked out, and I haven't had the brainpower to face it again for a while. At the moment, fromVariant isn't recognizing whatever type is being passed for numbers (as seen by running test\activex.nut) It's recognizing functions and I think even letting you call them. It's enumerating the properties perfectly. I had to comment out the Put method, but I'd like to refactor set and setByRef into that. That would make ActiveXObject more native to Walnut 1.x than JScript. : p Regards, Dan
Jan 02 2008
parent reply Alan Knowles <alan akbkhome.com> writes:
  .
 
 Yes, except the object isn't to copy DMDScript without the license,
 the objective is to create an engine that's significantly better.  At
 the moment, I would say roughly half the code is written and I'm
 using 108KB vs DMDScript's 513KB.  The parser is the only remaining
 component before it can (incorrectly) run javascript files.  The rest
 is debugging.
 
so is the idea to run the interpreter inside the parsing engine? or are you going to generate opcodes? - It wasn't quite clear? Regards Alan
Jan 02 2008
parent reply Dan <murpsoft hotmail.com> writes:
Alan Knowles Wrote:

   .
 
 Yes, except the object isn't to copy DMDScript without the license,
 the objective is to create an engine that's significantly better.  At
 the moment, I would say roughly half the code is written and I'm
 using 108KB vs DMDScript's 513KB.  The parser is the only remaining
 component before it can (incorrectly) run javascript files.  The rest
 is debugging.
 
so is the idea to run the interpreter inside the parsing engine? or are you going to generate opcodes? - It wasn't quite clear? Regards Alan
Would you believe me if I said combinations of both? For now I want to do this: 0) interpret top-level, and compile functions and loops to unoptimized native for execution (probably default behavior) Later, I'd like it to be able to: 1) tokenize everything and serialize the output. 2) interpret everything on-the-fly, using bytecode for loops, functions 3) compile the whole program to unoptimized native and serialize the output. 4) run serialized token streams, and serialized compiled scripts. I'm aware that's a tall order. That's why I'm not scheduling all those for Walnut 2.0. They'll come with following minor versions. Regards, Dan
Jan 03 2008
parent reply Alan Knowles <alan akbkhome.com> writes:
Yes, sounds like a good approach.. -

Might be worth playing with naming the Parse methods around the Grammer 
as documented in the ECMAScript spec.

eg. something like:

// methods? returns 1 if statement found?? or should it just throw an 
exception???
bool Statement(bool execute=true)
{
	switch(tok) {
		case '{': // Block:
			while(Statement());
			if (tok != '}') throw Error....
		case TEXT.var: //   VariableStatement:
			while(VariableStatement());
			if (tok != ';') throw Error....			
		case ';': // EmtpyStatement:
			return 1;
		
ExpressionStatement:  -- this may be tricky.. (it uses lookahead?)
		case TEXT.if:
			if (tok != '(') throw Error....
			bool doif = Expression(); // return true|false?
			if (!Statement(doif)) throw Error...
			if tok == TEXT.else
				if (!Statement(!doif)) throw Error...	
			
  IterationStatement:
  ContinueStatement:
  BreakStatement:
  ReturnStatment:
  WithStatement:
  LabelledStatement:
  SwitchStatement:
  ThrowStatement:
  TryStatement:


Regards
Alan


Dan wrote:
 Alan Knowles Wrote:
 
   .
 Yes, except the object isn't to copy DMDScript without the license,
 the objective is to create an engine that's significantly better.  At
 the moment, I would say roughly half the code is written and I'm
 using 108KB vs DMDScript's 513KB.  The parser is the only remaining
 component before it can (incorrectly) run javascript files.  The rest
 is debugging.
so is the idea to run the interpreter inside the parsing engine? or are you going to generate opcodes? - It wasn't quite clear? Regards Alan
Would you believe me if I said combinations of both? For now I want to do this: 0) interpret top-level, and compile functions and loops to unoptimized native for execution (probably default behavior) Later, I'd like it to be able to: 1) tokenize everything and serialize the output. 2) interpret everything on-the-fly, using bytecode for loops, functions 3) compile the whole program to unoptimized native and serialize the output. 4) run serialized token streams, and serialized compiled scripts. I'm aware that's a tall order. That's why I'm not scheduling all those for Walnut 2.0. They'll come with following minor versions. Regards, Dan
Jan 03 2008
parent reply Alan Knowles <alan akbkhome.com> writes:
That little snippet reminded me of another trick:

Dont create Tokens for single character Tokens - eg. -,=,",',..... etc.
Start the Token Enum from 127.

In the case of your Value Type Enum, this should be ok. (may take a bit 
of fudging with the Typedefs on Value.type / Enum creation.)

This enables you to do stuff like the example below:

switch(Value.type) {
     case ':':
     case Token.IF:
     case '=':

Which makes the code considerably more readable. (no remembering what 
Token.LT Token.GT where supposed to be...

Regards
Alan




Alan Knowles wrote:
 Yes, sounds like a good approach.. -
 
 Might be worth playing with naming the Parse methods around the Grammer 
 as documented in the ECMAScript spec.
 
 eg. something like:
 
 // methods? returns 1 if statement found?? or should it just throw an 
 exception???
 bool Statement(bool execute=true)
 {
     switch(tok) {
         case '{': // Block:
             while(Statement());
             if (tok != '}') throw Error....
         case TEXT.var: //   VariableStatement:
             while(VariableStatement());
             if (tok != ';') throw Error....           
         case ';': // EmtpyStatement:
             return 1;
        
 ExpressionStatement:  -- this may be tricky.. (it uses lookahead?)
         case TEXT.if:
             if (tok != '(') throw Error....
             bool doif = Expression(); // return true|false?
             if (!Statement(doif)) throw Error...
             if tok == TEXT.else
                 if (!Statement(!doif)) throw Error...   
            
  IterationStatement:
  ContinueStatement:
  BreakStatement:
  ReturnStatment:
  WithStatement:
  LabelledStatement:
  SwitchStatement:
  ThrowStatement:
  TryStatement:
 
 
 Regards
 Alan
 
 
 Dan wrote:
 Alan Knowles Wrote:

   .
 Yes, except the object isn't to copy DMDScript without the license,
 the objective is to create an engine that's significantly better.  At
 the moment, I would say roughly half the code is written and I'm
 using 108KB vs DMDScript's 513KB.  The parser is the only remaining
 component before it can (incorrectly) run javascript files.  The rest
 is debugging.
so is the idea to run the interpreter inside the parsing engine? or are you going to generate opcodes? - It wasn't quite clear? Regards Alan
Would you believe me if I said combinations of both? For now I want to do this: 0) interpret top-level, and compile functions and loops to unoptimized native for execution (probably default behavior) Later, I'd like it to be able to: 1) tokenize everything and serialize the output. 2) interpret everything on-the-fly, using bytecode for loops, functions 3) compile the whole program to unoptimized native and serialize the output. 4) run serialized token streams, and serialized compiled scripts. I'm aware that's a tall order. That's why I'm not scheduling all those for Walnut 2.0. They'll come with following minor versions. Regards, Dan
Jan 03 2008
parent reply Alan Knowles <alan akbkhome.com> writes:
Just checking the code against this - Why are you not using Type|Token 
numbers for the Keywords
...
	case TEXT_case:
                   v.s = TEXT_case;
                   v.type = TYPE.KEYWORD;
                   return v;
...

would work alot better as:
	
	switch(word) {
		case "case" : v.type   = Token.CASE;  return v;
		case "if" :   v.type   = Token.IF;    return v;
		case "else" : v.type   = Token.ELSE;  return v;		
		....
Thinking about this - it may be a good idea to
alias TYPE Token;

It will give you quite a good readability gain.

Regards
Alan







Alan Knowles wrote:
 
 That little snippet reminded me of another trick:
 
 Dont create Tokens for single character Tokens - eg. -,=,",',..... etc.
 Start the Token Enum from 127.
 
 In the case of your Value Type Enum, this should be ok. (may take a bit 
 of fudging with the Typedefs on Value.type / Enum creation.)
 
 This enables you to do stuff like the example below:
 
 switch(Value.type) {
     case ':':
     case Token.IF:
     case '=':
 
 Which makes the code considerably more readable. (no remembering what 
 Token.LT Token.GT where supposed to be...
 
 Regards
 Alan
 
 
 
 
 Alan Knowles wrote:
 Yes, sounds like a good approach.. -

 Might be worth playing with naming the Parse methods around the 
 Grammer as documented in the ECMAScript spec.

 eg. something like:

 // methods? returns 1 if statement found?? or should it just throw an 
 exception???
 bool Statement(bool execute=true)
 {
     switch(tok) {
         case '{': // Block:
             while(Statement());
             if (tok != '}') throw Error....
         case TEXT.var: //   VariableStatement:
             while(VariableStatement());
             if (tok != ';') throw Error....                   case 
 ';': // EmtpyStatement:
             return 1;
        ExpressionStatement:  -- this may be tricky.. (it uses lookahead?)
         case TEXT.if:
             if (tok != '(') throw Error....
             bool doif = Expression(); // return true|false?
             if (!Statement(doif)) throw Error...
             if tok == TEXT.else
                 if (!Statement(!doif)) throw Error...              
  IterationStatement:
  ContinueStatement:
  BreakStatement:
  ReturnStatment:
  WithStatement:
  LabelledStatement:
  SwitchStatement:
  ThrowStatement:
  TryStatement:


 Regards
 Alan


 Dan wrote:
 Alan Knowles Wrote:

   .
 Yes, except the object isn't to copy DMDScript without the license,
 the objective is to create an engine that's significantly better.  At
 the moment, I would say roughly half the code is written and I'm
 using 108KB vs DMDScript's 513KB.  The parser is the only remaining
 component before it can (incorrectly) run javascript files.  The rest
 is debugging.
so is the idea to run the interpreter inside the parsing engine? or are you going to generate opcodes? - It wasn't quite clear? Regards Alan
Would you believe me if I said combinations of both? For now I want to do this: 0) interpret top-level, and compile functions and loops to unoptimized native for execution (probably default behavior) Later, I'd like it to be able to: 1) tokenize everything and serialize the output. 2) interpret everything on-the-fly, using bytecode for loops, functions 3) compile the whole program to unoptimized native and serialize the output. 4) run serialized token streams, and serialized compiled scripts. I'm aware that's a tall order. That's why I'm not scheduling all those for Walnut 2.0. They'll come with following minor versions. Regards, Dan
Jan 03 2008
parent reply Alan Knowles <alan akbkhome.com> writes:
  Not sure if it's feasible to keep a 2.0 + gdc build working - but 
these changes (Attached) work for gdc
- looks like const(Value)  / const(char)  syntax completly throws gdc, 
even if it's inside a version(D_Version2)

Do you have a non-hotmail address - hotmail is notorious for just 
trashing emails without notice... - hope this get's through..

?? take discussion onto DMDScript newsgroup?


Regards
Alan
Jan 04 2008
parent Dan Lewis <murpsoft hotmail.com> writes:
Hi Alan,

Sorry it took so long to reply.  I must have left just as you were getting
started.  I definitely like the idea of using charcodes as token types for some
tokens.

Yesterday I tried declaring a enum TEXT : static const(char)[] and couldn't get
it working even down to char[].  : p

Perchance you know how?

I'll examine the diff and make changes as best I can.

We ought to move this to either the dsource.org/ walnut forums, skype(
murposaurus ) msn, hotmail or whatnot.  It's moved away from the subject matter
of digitalmars.D.learn.

Regards,
Dan
Jan 04 2008