D - [BUG] dmd does not implement LR analysis
- Manfred Nowak (15/15) Mar 12 2004 Also not explicitely specified the usual left-to-right lexical analysis
- Walter (4/18) Mar 13 2004 ... is a valid token. You'll need to put the space after the first . to ...
- Manfred Nowak (31/34) Mar 13 2004 I am not talking about meanings I wish. I noticed this departure from th...
- Stewart Gordon (28/44) Mar 16 2004 You're right, that syntax highlighters that are strictly LR have trouble...
- Matthew (3/14) Mar 16 2004 I think the cast operator should be mandatory
- J C Calvarese (6/27) Mar 16 2004 I absolutely agree. It has to be now. Before D 1.0 is set and we have a
- Matthew (5/29) Mar 17 2004 computer
- Manfred Nowak (8/21) Mar 17 2004 Thanks for this link.
- Stewart Gordon (14/19) Mar 17 2004 2. . 4
- Manfred Nowak (4/5) Mar 17 2004 Agreed. I did not think of this argument.
- Ben Hinkle (6/30) Mar 13 2004 Fortran, MATLAB and Python use : for slicing instead of ..
- C. Sauls (7/10) Mar 14 2004 MOO uses '..' as well, and having recently written a MOO
-
Stewart Gordon
(10/14)
Mar 15 2004
- Manfred Nowak (21/22) Mar 15 2004 context free is an attribute that belongs to grammars. At your will dmd
- larry cowan (6/16) Mar 15 2004 For what it's worth, .5+4. and 4.+.5 both work as expected, equaling 4.5...
Also not explicitely specified the usual left-to-right lexical analysis and parsing of the grammar of D is currently not implemented in dmd. Currently `2.' and `.4' are legal real numbers. Therefore the look alike range `[cast(int)2..4]' is not a range but should be analysed as two consecutive real numbers, as if it is written as `[cast(int)2. .4]', and therefore should yield something like: | found '0.4' when expecting ']' In the lexical analysis phase of dmd there has been done some trickery to prevent this, i.e. looking ahead and backing up. On the other hand this trickery prevents now, that the legal range expression `[cast(int)2...4]' which could be written as `[cast(int)2. .. 4]' is not correctly identified by dmd. dmd yields: | found '...' when expecting ']' So long.
Mar 12 2004
"Manfred Nowak" <svv1999 hotmail.com> wrote in message news:c2uekl$1995$1 digitaldaemon.com...Also not explicitely specified the usual left-to-right lexical analysis and parsing of the grammar of D is currently not implemented in dmd. Currently `2.' and `.4' are legal real numbers. Therefore the look alike range `[cast(int)2..4]' is not a range but should be analysed as two consecutive real numbers, as if it is written as `[cast(int)2. .4]', and therefore should yield something like: | found '0.4' when expecting ']' In the lexical analysis phase of dmd there has been done some trickery to prevent this, i.e. looking ahead and backing up. On the other hand this trickery prevents now, that the legal range expression `[cast(int)2...4]' which could be written as `[cast(int)2. .. 4]' is not correctly identified by dmd. dmd yields: | found '...' when expecting ']' So long.... is a valid token. You'll need to put the space after the first . to get the meaning you wish. True, the lexer does a bit of lookahead, but why not?
Mar 13 2004
Walter wrote:... is a valid token. You'll need to put the space after the first . to get the meaning you wish.I am not talking about meanings I wish. I noticed this departure from the norm, because the public available syntax highlighting extension for D for vim exposed me `[2..4]' as two consecutive reals, thereby pointing me out, that my own syntax highlighting extension is wrong because I thought, that it is illegal to have an empty integer or fractional part in a real. Then: following the usual left-to-right-analysis it is correct to analyze the construct in question as two consecutive reals and furthermore there is no way to build an LR-highlighter that is able to highlight the construct in question as two integer numbers divided by the range operator `..'. Even the `d2html' example highlights the construct in question as the real `2.', followed by a `.', followed by the integer `4'. I do not believe that any syntax highlighter currently out there is able to highlight the construct in question correctly.True, the lexer does a bit of lookahead, but why not?That depends on what DigitalMars has in mind with the language D and the de facto reference compiler dmd. If the intention of DigitalMars is to tempt a certain amount of computer nerds to the language D by promising an open standard and at the same time bind them to a proprietary implementation not fully consistent with the proposed standard and its somehow natural interpretation, then it is quite okay to make even more departures than the two I have detected: - the one which is the matter of this thread, and - the `cast' operator beeing optional in dmd. If the intention of DigitalMars is to keep the language D and the de facto reference compiler dmd in a homogeneous state, then the existence of both exposed deviations is not okay. There might be more intentions of DigitalMars, which I am unable to recognize. So long.
Mar 13 2004
Manfred Nowak wrote: <snip>Even the `d2html' example highlights the construct in question as the real `2.', followed by a `.', followed by the integer `4'. I do not believe that any syntax highlighter currently out there is able to highlight the construct in question correctly.You're right, that syntax highlighters that are strictly LR have trouble with syntaxes that aren't strictly LR. But see below....Depends on whether the lexicality is supposed to be strictly LR. But I did just notice this in the spec: "There are no digraphs or trigraphs in D. The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can. For example >> is a right shift token, not two greater than tokens." But if that's exactly true, then from the way string literals are specified, surely in qwert("yuiop", "asdfg") a single, 14-character string is being passed?True, the lexer does a bit of lookahead, but why not?That depends on what DigitalMars has in mind with the language D and the de facto reference compiler dmd.I think what it should have in mind is making the spec clearer. You're right, there's nothing suggesting that 2..4 should be 2 .. 4 and not 2. .4 or even any of the three other possibilities. Of course it isn't difficult to write a lexer that looks ahead two or three characters. The only trouble is that it's doing it for what's not clearly specified.If the intention of DigitalMars is to tempt a certain amount of computer nerds to the language D by promising an open standard and at the same time bind them to a proprietary implementation not fully consistent with the proposed standard and its somehow natural interpretation, then it is quite okay to make even more departures than the two I have detected: - the one which is the matter of this thread, and - the `cast' operator beeing optional in dmd.<snip> You're right, that's just what I've been thinking for a while. There does seem to be both an inconsistency and a deviation from CFG with casts. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Mar 16 2004
timeIf the intention of DigitalMars is to tempt a certain amount of computer nerds to the language D by promising an open standard and at the samequitebind them to a proprietary implementation not fully consistent with the proposed standard and its somehow natural interpretation, then it isI think the cast operator should be mandatoryokay to make even more departures than the two I have detected: - the one which is the matter of this thread, and - the `cast' operator beeing optional in dmd.<snip> You're right, that's just what I've been thinking for a while. There does seem to be both an inconsistency and a deviation from CFG with casts.
Mar 16 2004
Matthew wrote:I absolutely agree. It has to be now. Before D 1.0 is set and we have a bunch of legacy code with C-style casts hanging around. -- Justin http://jcc_7.tripod.com/d/timeIf the intention of DigitalMars is to tempt a certain amount of computer nerds to the language D by promising an open standard and at the samequitebind them to a proprietary implementation not fully consistent with the proposed standard and its somehow natural interpretation, then it isI think the cast operator should be mandatoryokay to make even more departures than the two I have detected: - the one which is the matter of this thread, and - the `cast' operator beeing optional in dmd.<snip> You're right, that's just what I've been thinking for a while. There does seem to be both an inconsistency and a deviation from CFG with casts.
Mar 16 2004
"J C Calvarese" <jcc7 cox.net> wrote in message news:c38i7b$un$1 digitaldaemon.com...Matthew wrote:computerIf the intention of DigitalMars is to tempt a certain amount ofcasts.timenerds to the language D by promising an open standard and at the samequitebind them to a proprietary implementation not fully consistent with the proposed standard and its somehow natural interpretation, then it isokay to make even more departures than the two I have detected: - the one which is the matter of this thread, and - the `cast' operator beeing optional in dmd.<snip> You're right, that's just what I've been thinking for a while. There does seem to be both an inconsistency and a deviation from CFG withQuite right. Let me presumptuously institute a vote.I think the cast operator should be mandatoryI absolutely agree. It has to be now. Before D 1.0 is set and we have a bunch of legacy code with C-style casts hanging around.
Mar 17 2004
Stewart Gordon wrote: [...]"There are no digraphs or trigraphs in D. The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can. For example >> is a right shift token, not two greater than tokens."Thanks for this link.But if that's exactly true, then from the way string literals are specified, surely in qwert("yuiop", "asdfg") a single, 14-character string is being passed?Right. It should be specified, that allowed characters do not include the delimiting `"' or ``'.I think what it should have in mind is making the spec clearer. You're right, there's nothing suggesting that 2..4 should be 2 .. 4 and not 2. .4 or even any of the three other possibilities.I see five, but only when not using longest match. [...] So long!
Mar 17 2004
Manfred Nowak wrote: <snip>2. . 4 2 . .4 2 . . 4 Of course, the character sequence could be split up as 2.. 4 2 ..4 but these involve what aren't valid D tokens. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.I think what it should have in mind is making the spec clearer. You're right, there's nothing suggesting that 2..4 should be 2 .. 4 and not 2. .4 or even any of the three other possibilities.I see five, but only when not using longest match.
Mar 17 2004
Stewart Gordon wrote: [...]but these involve what aren't valid D tokens.Agreed. I did not think of this argument. So long!
Mar 17 2004
On Sat, 13 Mar 2004 14:28:35 -0800, "Walter" <walter digitalmars.com> wrote:"Manfred Nowak" <svv1999 hotmail.com> wrote in message news:c2uekl$1995$1 digitaldaemon.com...Fortran, MATLAB and Python use : for slicing instead of .. I don't know the history of why but maybe this parsing issue factored into it. The .. reminds me more of Pascal. -BenAlso not explicitely specified the usual left-to-right lexical analysis and parsing of the grammar of D is currently not implemented in dmd. Currently `2.' and `.4' are legal real numbers. Therefore the look alike range `[cast(int)2..4]' is not a range but should be analysed as two consecutive real numbers, as if it is written as `[cast(int)2. .4]', and therefore should yield something like: | found '0.4' when expecting ']' In the lexical analysis phase of dmd there has been done some trickery to prevent this, i.e. looking ahead and backing up. On the other hand this trickery prevents now, that the legal range expression `[cast(int)2...4]' which could be written as `[cast(int)2. .. 4]' is not correctly identified by dmd. dmd yields: | found '...' when expecting ']' So long.... is a valid token. You'll need to put the space after the first . to get the meaning you wish. True, the lexer does a bit of lookahead, but why not?
Mar 13 2004
MOO uses '..' as well, and having recently written a MOO parser/compiler/driver I can say its do-able. Of course, MOO requires that floating-point numbers contain both integer and fraction, even if one is equal to 0 so maybe that makes all the difference. -C. Sauls -Invironz Ben Hinkle wrote:Fortran, MATLAB and Python use : for slicing instead of .. I don't know the history of why but maybe this parsing issue factored into it. The .. reminds me more of Pascal.
Mar 14 2004
Manfred Nowak wrote: <snip>Currently `2.' and `.4' are legal real numbers. Therefore the look alike range `[cast(int)2..4]' is not a range but should be analysed as two consecutive real numbers, as if it is written as `[cast(int)2. .4]', and therefore should yield something like:<snip> That's news to me. I'd imagined the tokenisation of D was supposed to be context-free. Stewart. -- My e-mail is valid but not my primary mailbox, aside from its being the unfortunate victim of intensive mail-bombing at the moment. Please keep replies on the 'group where everyone may benefit.
Mar 15 2004
Stewart Gordon wrote: [...]I'd imagined the tokenisation of D was supposed to be context-free.context free is an attribute that belongs to grammars. At your will dmd has not a context free lexical analysis, because the case "natural number followed by a point" is treated in a special way. Lexical analysis usually is carried out by left-to-right finding the next _longest_ part of the remaining source that belongs to a token. This is called LR analysis. I.e. `return2;' is the identifier `return2', not the keyword `return' followed by the integer number `2', followed by a `;'. Not having an LR lexical analysis does not change the attribute context free for the grammar, also it is a convention to have LR lexical analysis with a context free grammar. If D breaks this convention it should be explicitely mentioned in the specification. If the non LR anaylsis stays, then the door is open for more implicite deviations from the conventions, like the one I mentioned with the `return2'. Even the suggestion of an operator that overrides the usual LR lexical analysis may arise. I would like `§$° ' to be supported then :-) So long!
Mar 15 2004
In article <c348v1$1o1i$1 digitaldaemon.com>, Stewart Gordon says...Manfred Nowak wrote: <snip>For what it's worth, .5+4. and 4.+.5 both work as expected, equaling 4.5, but I would rather have leading and trailing 0's required for literal floats,doubles, and reals. -(.5-4.), 4.-.5 , 4.*-8. , 4./.2 , .1/16. , and 04*20. all look pretty strange at first glance. I think FP literals should be more obviously differentiated from integer literals.Currently `2.' and `.4' are legal real numbers. Therefore the look alike range `[cast(int)2..4]' is not a range but should be analysed as two consecutive real numbers, as if it is written as `[cast(int)2. .4]', and therefore should yield something like:<snip> That's news to me. I'd imagined the tokenisation of D was supposed to be context-free. Stewart.
Mar 15 2004