archives

« Bugzilla Issues Index

#4002 — a parsing context with no always-correct lexical goal symbol for getting the next token


This bug is not against rev33 as published, but as modified according to
Bug 635 comment 9. (I could wait for rev34, but I don't think I should.)
Specifically, "[Lexical goal]" annotations have been removed from the grammar, and clause 11 has been modified to say something like:

If the context allows RegularExpressionLiteral,
use InputElementRegExp.
If the context allows TemplateMiddle or TemplateTail,
use InputElementTemplateTail.
Otherwise, use InputElementDiv.

In Bug 635 comment 3, I added:
(And you can note that the first two possibilities are [or should be]
mutually exclusive.)

In this bug, I show that they *aren't* mutually exclusive.

Consider these two GeneratorDeclarations:

function * gen() { `pre${yield /.*/}post`; }
function * gen() { `pre${yield }post`; }

(I don't know why anyone would write that, but I believe they're both syntactically valid.)

And now consider the state that a parser is in having consumed just

function * gen() { `pre${yield

As shown above, the set of valid next-tokens include both RegularExpressionLiteral and TemplateTail. Thus, the contexts in which lexical goal symbols InputElementRegExp and InputElementTemplateTail are appropriate are *not* mutually exclusive.

In fact, there's no goal symbol of the lexical grammar that derives both RegularExpressionLiteral and TemplateTail. So, given this left-context, there is no always-correct choice for the lexical goal symbol to use to get the next token.

(A real-world parser would maybe sniff the next character [after skipping WhiteSpace etc] and then make the necessary choice of lexical goal symbol. But that seems like a kludgey kind of thing for the spec to say.)


I guess I need to add

InputElementRegExpOrTemplateTail ::
WhiteSpace
LineTerminator
Comment
Token
DivPunctuator
TemplateSubstitutionTail

and probably a note with the yield grammar


(In reply to Allen Wirfs-Brock from comment #1)

I meant:

>
> InputElementRegExpOrTemplateTail ::
> WhiteSpace
> LineTerminator
> Comment
> Token
RegularExpressionLiteral
> TemplateSubstitutionTail


fixed in rev34 editor's draft


Hm, yeah, I think that might work.

So then the rule could be:

If the context allows both RegularExpressionLiteral and TemplateTail,
use InputElementRegExpOrTemplateTail

If the context allows RegularExpressionLiteral but not TemplateTail,
use InputElementRegExp.

If the context allows TemplateTail but not RegularExpressionLiteral,
use InputElementTemplateTail.

Otherwise, use InputElementDiv.

(I'm assuming you can leave out mention of TemplateMiddle because the contexts in which it's allowed are the same as for TemplateTail.)


> Otherwise, use InputElementDiv.

Or, for more parallelism,
If the context allows neither TemplateTail nor RegularExpressionLiteral,
use InputElementDiv.


Or use a table like this:
TemplateTail
... is allowed ... is not allowed
RegularExpressionLiteral
... is allowed InputElementRegExpOrTemplateTail InputElementRegExp

... is not allowed InputElementTemplateTail InputElementDiv


fixed in rev34