www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Lexer related questions

reply "Casper Ellingsen" <no reply.com> writes:
Hey,

I'm using JFlex (http://jflex.de/) to implement a lexical analyser for the  
D language. I've already got quite alot done, but there's some issues here  
and there that I need to work on. Also, there's a couple things I need  
feedback on.

For example, I can't seem to understand why it's allowed to have several  
succeeding _'s in a decimal/integer value. The grammer says

Decimal:
	0
	NonZeroDigit
	NonZeroDigit Decimal
	NonZeroDigit _ Decimal

which means that 0, 1, 12, 1_2 and 1_2_3 is allowed, but in my opinion,  
1__2__3 is not allowed. The DMD compiler, however, accepts that value as  
123.

Also, the specification (http://www.digitalmars.com/d/lex.html) seems to  
lack information on some parts of the grammar. For example, it says

Float:
	DecimalFloat
	HexFloat
	Float _

but it doesn't describe the grammar of DecimalFloat nor HexFloat.

I'll post more questions once I find other issues.
-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Jan 13 2006
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Casper Ellingsen wrote:
 Hey,
 
 I'm using JFlex (http://jflex.de/) to implement a lexical analyser for 
 the D language. I've already got quite alot done, but there's some 
 issues here and there that I need to work on. Also, there's a couple 
 things I need feedback on.
 
 For example, I can't seem to understand why it's allowed to have several 
 succeeding _'s in a decimal/integer value. The grammer says
 
 Decimal:
     0
     NonZeroDigit
     NonZeroDigit Decimal
     NonZeroDigit _ Decimal
 
 which means that 0, 1, 12, 1_2 and 1_2_3 is allowed, but in my opinion, 
 1__2__3 is not allowed. The DMD compiler, however, accepts that value as 
 123.
The D regexp and BNF information is woefully inaccurate in places, largely because Walter wrote DMD entirely by hand. You're best off verifying it against the written documentation: http://digitalmars.com/d/lex.html#integerliteral "Integers can have embedded '_' characters, which are ignored."
 Also, the specification (http://www.digitalmars.com/d/lex.html) seems to 
 lack information on some parts of the grammar. For example, it says
 
 Float:
     DecimalFloat
     HexFloat
     Float _
 
 but it doesn't describe the grammar of DecimalFloat nor HexFloat.
Same thing here. Check this link: http://digitalmars.com/d/lex.html#floatliteral Though I suspect that aside from the embedded underscores, the syntax is identical to what it is in C/C++. Here's the pertinent bit of the C++ standard: floating-literal: fractional-constant exponent-part(opt) floating-suffix(opt) digit-sequence exponent-part floating-suffix(opt) fractional-constant: digit-sequence(opt) . digit-sequence digit-sequence . exponent-part: e sign(opt) digit-sequence E sign(opt) digit-sequence sign: one of + - digit-sequence: digit digit-sequence digit floating-suffix: one of f l F L
Jan 13 2006
parent "Casper Ellingsen" <no reply.com> writes:
On Fri, 13 Jan 2006 23:02:31 +0100, Sean Kelly <sean f4.ca> wrote:

 Though I suspect that aside from the embedded underscores, the syntax is  
 identical to what it is in C/C++.  Here's the pertinent bit of the C++  
 standard:

 floating-literal:
 	fractional-constant exponent-part(opt) floating-suffix(opt)
 	digit-sequence exponent-part floating-suffix(opt)
 fractional-constant:
 	digit-sequence(opt) . digit-sequence
 	digit-sequence .
 exponent-part:
 	e sign(opt) digit-sequence
 	E sign(opt) digit-sequence
 sign: one of
 	+ -
 digit-sequence:
 	digit
 	digit-sequence digit
 floating-suffix: one of
 	f l F L
Thanks. As far as I can tell, this syntax is the same as for D, except for the floating-suffix, which has no imaginary part in C/C++. That's an easy fix though. I already added it to the jflex file, and it seems to work perfectly. Now I'll move on to hex floats. -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Jan 13 2006
prev sibling next sibling parent reply "Casper Ellingsen" <no reply.com> writes:
On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no reply.com> wrote:

This is more a parser related question, but still, here goes: What  
visibility will the following function have, and why is it even legal to  
use more than one visibility keyword in combination like that? I mean, is  
it anything but confusing?

public package private foo(int i) {
	writefln(i);
}

Also, how accurate is the BNF in  
http://www.digitalmars.com/d/declaration.html?
-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Jan 14 2006
next sibling parent reply Sean Kelly <sean f4.ca> writes:
Casper Ellingsen wrote:
 On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no reply.com> wrote:
 
 This is more a parser related question, but still, here goes: What 
 visibility will the following function have, and why is it even legal to 
 use more than one visibility keyword in combination like that? I mean, 
 is it anything but confusing?
 
 public package private foo(int i) {
     writefln(i);
 }
I'd guess it would be private, and equivalent to the following: public: package: private: void foo(int i);
 Also, how accurate is the BNF in 
 http://www.digitalmars.com/d/declaration.html?
It looks pretty close, at a glance. But perhaps someone who's spent more time with the D parser could offer a more informed opinion. Sean
Jan 14 2006
parent "Casper Ellingsen" <no reply.com> writes:
On Sun, 15 Jan 2006 06:12:13 +0100, Sean Kelly <sean f4.ca> wrote:

 Casper Ellingsen wrote:
 On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no reply.com>  
 wrote:
  This is more a parser related question, but still, here goes: What  
 visibility will the following function have, and why is it even legal  
 to use more than one visibility keyword in combination like that? I  
 mean, is it anything but confusing?
  public package private foo(int i) {
     writefln(i);
 }
I'd guess it would be private, and equivalent to the following: public: package: private: void foo(int i);
Yes, that could make sense. I haven't had the time to confirm this yet though.
 Also, how accurate is the BNF in  
 http://www.digitalmars.com/d/declaration.html?
It looks pretty close, at a glance. But perhaps someone who's spent more time with the D parser could offer a more informed opinion.
Some of it looks correct, but other parts confuse me. Like the '() Declarator' part of the Declarator rule. Can someone please provide me with an example of usage of this rule? Also, isn't the last declarator rule redundant? Declarator: BasicType2 Declarator Identifier () Declarator Identifier DeclaratorSuffixes () Declarator DeclaratorSuffixes -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Jan 14 2006
prev sibling parent Hasan Aljudy <hasan.aljudy gmail.com> writes:
Casper Ellingsen wrote:
 On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no reply.com> wrote:
 
 Also, how accurate is the BNF in  
 http://www.digitalmars.com/d/declaration.html?
I don't really know. I'm toying with a making a parser .. I couldn't use exactly the grammer that was there .. too confusing. I tried to come up with my own description of the grammer .. it's not complete, mind you. I introduced some new rules to resolve some ambiguities (actually, work around them). It's very experimental (and incomplete) at the moment. Use with care (if you ever use it anyway). Note that I didn't include any keyword (i.e. int, float, etc) in the Type, because I don't lex them as keywords, but as Identifiers. I'm not even sure how accurate it is, but here it is anyway: Declaration: Type Declarator ; Type Declarator , DeclIdentifierList ; Type Declarator Parameters ; Type Declarator Parameters FunctionBody Type: IdentifierSequence IdentifierSequence TypeSuffixes TypeSuffixes: TypeSuffix TypeSuffix TypeSuffixes TypeSuffix: Pointer Array FunctionPointer Delegate Pointer: * Array: [] [ ExprType ] ExprType: AssignExpression AssignExpression TypeSuffixes FunctionPointer: function Parameters Delegate: delegate Parameters Declarator: Identifier Declarator CTypeSuffixes Declarator = Initializer ( Declarator ) ( TypeSuffixes Declarator ) CTypeSuffixes: Array Array CTypeSuffixes DeclIdentifierList: DeclIdentifier DeclIdentifier, DeclIdentifierList DeclIdentifier: Identifier Identifier = Initializer IdentifierSequence: IdentifierList .IdentifierList IdentifierSequence ! TemplateArguments IdentifierList: Identifier Identifier.IdentifierList TemplateArguments: ( TemplateArgumentList ) TemplateArgumentList: TemplateArgument TemplateArgument, TemplateArgumentList TemplateArgument: ExprType Initializer: void AssignExpression ArrayInitializer StructInitializer ArrayInitializer: [ ArrayMemberInitializations ] [ ] ArrayMemberInitializations: ArrayMemberInitialization ArrayMemberInitialization , ArrayMemberInitialization , ArrayMemberInitializations ArrayMemberInitialization: AssignExpression AssignExpression : AssignExpression StructInitializer: { } { StructMemberInitializers } StructMemberInitializers: StructMemberInitializer StructMemberInitializer , StructMemberInitializer , StructMemberInitializers StructMemberInitializer: AssignExpression Identifier : AssignExpression Parameters: ( ) ( ParameterList ) ParameterList: Paremeter Parameter, ParameterList Parameter: Type Type Declarator Type Declarator = Initializer InOut Parameter InOut: in out inout FunctionBody: StatementBlock FunctionContracts body StatementBlock FunctionContracts: InContract OutContract InContract OutContract OutContract InContract InContract: in StatementBlock OutContract: out StatementBlock out ( Identifier ) StatementBlock
Jan 15 2006
prev sibling next sibling parent reply "Casper Ellingsen" <no reply.com> writes:
On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no reply.com> wrote:

A version condition is defined in  
http://www.digitalmars.com/d/version.html as

	VersionCondition:
		version () Integer
		version () Identifier

One valid version condition is

	version(X86)

so why isn't the BNF rules defined as

	VersionCondition:
		version ( Integer )
		version ( Identifier )

instead? It just seems odd to me, and really confused me for a while.
-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Jan 15 2006
parent reply Don Clugston <dac nospam.com.au> writes:
Casper Ellingsen wrote:
 On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no reply.com> wrote:
 
 A version condition is defined in  
 http://www.digitalmars.com/d/version.html as
 
     VersionCondition:
         version () Integer
         version () Identifier
 
 One valid version condition is
 
     version(X86)
 
 so why isn't the BNF rules defined as
 
     VersionCondition:
         version ( Integer )
         version ( Identifier )
 
 instead? It just seems odd to me, and really confused me for a while.
The parentheses are in the wrong place all through the docs. I think it's a ddoc problem (the docs weren't updated properly when they were converted to Ddoc).
Jan 15 2006
parent reply Bruno Medeiros <daiphoenixNO SPAMlycos.com> writes:
Don Clugston wrote:
 Casper Ellingsen wrote:
 
 On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no reply.com> 
 wrote:

 A version condition is defined in  
 http://www.digitalmars.com/d/version.html as

     VersionCondition:
         version () Integer
         version () Identifier

 One valid version condition is

     version(X86)

 so why isn't the BNF rules defined as

     VersionCondition:
         version ( Integer )
         version ( Identifier )

 instead? It just seems odd to me, and really confused me for a while.
The parentheses are in the wrong place all through the docs.
Indeed. I've wondered if that was wrong, or if it was just a different kind of notation for the grammar, that I was unfamiliar with, since I'm no expert in this subject.
 I think 
 it's a ddoc problem (the docs weren't updated properly when they were 
 converted to Ddoc).
Hum... What does the grammar doc have anything to do with ddoc ? -- Bruno Medeiros - CS/E student "Certain aspects of D are a pathway to many abilities some consider to be... unnatural."
Jan 18 2006
parent Don Clugston <dac nospam.com.au> writes:
Bruno Medeiros wrote:
 Don Clugston wrote:
 
 Casper Ellingsen wrote:

 On Fri, 13 Jan 2006 22:38:18 +0100, Casper Ellingsen <no reply.com> 
 wrote:

 A version condition is defined in  
 http://www.digitalmars.com/d/version.html as

     VersionCondition:
         version () Integer
         version () Identifier

 One valid version condition is

     version(X86)

 so why isn't the BNF rules defined as

     VersionCondition:
         version ( Integer )
         version ( Identifier )

 instead? It just seems odd to me, and really confused me for a while.
The parentheses are in the wrong place all through the docs.
Indeed. I've wondered if that was wrong, or if it was just a different kind of notation for the grammar, that I was unfamiliar with, since I'm no expert in this subject.
 I think it's a ddoc problem (the docs weren't updated properly when 
 they were converted to Ddoc).
Hum... What does the grammar doc have anything to do with ddoc ?
Nothing, except that they are no longer written in HTML, they're .ddoc files which are converted into HTML (so that they get proper D code colouring, etc). Funny things happened to the ampersands (in ddoc you can write &, in HTML it must be &amp;), and apparently the parentheses, too.
Jan 18 2006
prev sibling parent reply "Casper Ellingsen" <no reply.com> writes:
There's two conflicting definitions of postfix expressions in  
http://www.digitalmars.com/d/expression.html. In the BNF at the top a  
postfix expression is defined as

	PostfixExpression:
		PrimaryExpression
		PostfixExpression . Identifier
		PostfixExpression ++
		PostfixExpression --
		PostfixExpression ( )
		PostfixExpression ( ArgumentList )
		IndexExpression
		SliceExpression

	IndexExpression:
		PostfixExpression [ ArgumentList ]

	SliceExpression:
		PostfixExpression [ ]
		PostfixExpression [ AssignExpression .. AssignExpression ]

On the other hand, in the textual description further down, a postfix  
expression is defined as

	PostfixExpression:
		PostfixExpression . Identifier
		PostfixExpression -> Identifier
		PostfixExpression ++
		PostfixExpression --
		PostfixExpression ( ArgumentList )
		PostfixExpression [ ArgumentList ]
		PostfixExpression [ AssignExpression .. AssignExpression ]

The first one has

		PostfixExpression ( )
		PostfixExpression [ ]

which the second one doesn't have, whereas the second one has

		PostfixExpression -> Identifier

which the first one doesn't have. What's the correct definition? Oh, if  
only the BNF grammar was correct. :/
-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Jan 16 2006
parent Hasan Aljudy <hasan.aljudy gmail.com> writes:
Casper Ellingsen wrote:
 There's two conflicting definitions of postfix expressions in  
 http://www.digitalmars.com/d/expression.html. In the BNF at the top a  
 postfix expression is defined as
 
     PostfixExpression:
         PrimaryExpression
         PostfixExpression . Identifier
         PostfixExpression ++
         PostfixExpression --
         PostfixExpression ( )
         PostfixExpression ( ArgumentList )
         IndexExpression
         SliceExpression
 
     IndexExpression:
         PostfixExpression [ ArgumentList ]
 
     SliceExpression:
         PostfixExpression [ ]
         PostfixExpression [ AssignExpression .. AssignExpression ]
 
 On the other hand, in the textual description further down, a postfix  
 expression is defined as
 
     PostfixExpression:
         PostfixExpression . Identifier
         PostfixExpression -> Identifier
         PostfixExpression ++
         PostfixExpression --
         PostfixExpression ( ArgumentList )
         PostfixExpression [ ArgumentList ]
         PostfixExpression [ AssignExpression .. AssignExpression ]
 
 The first one has
 
         PostfixExpression ( )
         PostfixExpression [ ]
 
 which the second one doesn't have, whereas the second one has
 
         PostfixExpression -> Identifier
 
 which the first one doesn't have. What's the correct definition? Oh, if  
 only the BNF grammar was correct. :/
Obviously the second one is obselete. D doesn't have the -> operator, it seems like it had it in the past though. Also, the [] on expressions is a ``slice`` operator, which goes (I think) like [0..$]
Jan 16 2006