archives

« Bugzilla Issues Index

#4148 — 11.6.1.1 Static Semantics: Early Errors


> IdentifierStart :: \ UnicodeEscapeSequence
>
> It is a Syntax Error if SV(UnicodeEscapeSequence) is neither the UTF16Encoding (10.1.1) of a single Unicode code point with the Unicode property “ID_Start” nor "$" or "_".

`Other_ID_Start` is missing.

> IdentifierPart :: \ UnicodeEscapeSequence
>
> It is a Syntax Error if SV(UnicodeEscapeSequence) is neither the UTF16Encoding (10.1.1) of a single Unicode code point with the Unicode property “ID_Continue” nor "$" or "_" nor the UTF16Encoding of either <ZWNJ> or <ZWJ>.

`Other_ID_Continue` and `Other_ID_Start` is missing.

Section 11.6 does mention those in the definitions of `UnicodeIDStart` and `UnicodeIDContinue` so either this section should too, or such mentions should be removed everywhere.


For the record, it seems explicitly mentioning `Other_ID_Start` in the definition of `UnicodeIDContinue` is redundant.


(In reply to Mathias Bynens from comment #1)
> For the record, it seems explicitly mentioning `Other_ID_Start` in the
> definition of `UnicodeIDContinue` is redundant.

No, see bug 3027.


Unicode 7.0, PropList.txt:

# ================================================

2118 ; Other_ID_Start # Sm SCRIPT CAPITAL P
212E ; Other_ID_Start # So ESTIMATED SYMBOL
309B..309C ; Other_ID_Start # Sk [2] KATAKANA-HIRAGANA VOICED SOUND MARK..KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK

# Total code points: 4

# ================================================

00B7 ; Other_ID_Continue # Po MIDDLE DOT
0387 ; Other_ID_Continue # Po GREEK ANO TELEIA
1369..1371 ; Other_ID_Continue # No [9] ETHIOPIC DIGIT ONE..ETHIOPIC DIGIT NINE
19DA ; Other_ID_Continue # No NEW TAI LUE THAM DIGIT ONE

# Total code points: 12

# ================================================


fixed in rev36 editor's draft

the early error problem,


(In reply to André Bargull from comment #2)
> Unicode 7.0, PropList.txt:
>
> # ================================================
>
> 2118 ; Other_ID_Start # Sm SCRIPT CAPITAL P
> 212E ; Other_ID_Start # So ESTIMATED SYMBOL
> 309B..309C ; Other_ID_Start # Sk [2] KATAKANA-HIRAGANA VOICED SOUND
> MARK..KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
>
> # Total code points: 4
>
> # ================================================
>
> 00B7 ; Other_ID_Continue # Po MIDDLE DOT
> 0387 ; Other_ID_Continue # Po GREEK ANO TELEIA
> 1369..1371 ; Other_ID_Continue # No [9] ETHIOPIC DIGIT ONE..ETHIOPIC
> DIGIT NINE
> 19DA ; Other_ID_Continue # No NEW TAI LUE THAM DIGIT ONE
>
> # Total code points: 12
>
> # ================================================

Exactly — All those `Other_ID_Start` code points are already in `ID_Continue` anyway.

To quickly confirm that explicitly including `Other_ID_Start` for `UnicodeIDContinue` doesn’t make a difference, use a script like https://gist.github.com/mathiasbynens/6334847. See the latest revision on https://gist.github.com/mathiasbynens/6334847 which didn’t cause any changes whatsoever in the generated output.

Btw, rather than `PropList.txt` it seems `DerivedCoreProperties.txt` should be used as per http://unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers.


(In reply to Mathias Bynens from comment #4)
> Exactly — All those `Other_ID_Start` code points are already in
> `ID_Continue` anyway.

Err, in bug 2717 you've requested to explicitly state that Other_ID_Start and Other_ID_Continue are included.


> Btw, rather than `PropList.txt` it seems `DerivedCoreProperties.txt` should
> be used as per
> http://unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers.

Other_ID_Start and Other_ID_Continue are defined in PropList.txt.


So, if UnicodeIDContinue mentions Other_ID_Continue it should also mention Other_ID_Start, because Other_ID_Start ∩ Other_ID_Continue = ∅.


(In reply to André Bargull from comment #5)
> (In reply to Mathias Bynens from comment #4)
> > Exactly — All those `Other_ID_Start` code points are already in
> > `ID_Continue` anyway.
>
> Err, in bug 2717 you've requested to explicitly state that Other_ID_Start
> and Other_ID_Continue are included.

I asked to explicitly include `Other_ID_Start` alongside `ID_Start` and `Other_ID_Continue` alongside `ID_Continue` (just to clarify/repeat what UAX #31 says). How is this relevant to this discussion, though? IIUC this is a different issue, about adding `Other_ID_Start` to `UnicodeIDContinue`.

> Other_ID_Start and Other_ID_Continue are defined in PropList.txt.

Why should it be used over `DerivedCoreProperties.txt`? To me http://www.unicode.org/reports/tr44/#Simple_Derived sounds like `DerivedCoreProperties.txt` is the reference for derived core properties such as `ID_Start` and `ID_Continue`.


(In reply to André Bargull from comment #6)
> So, if UnicodeIDContinue mentions Other_ID_Continue it should also mention
> Other_ID_Start, because Other_ID_Start ∩ Other_ID_Continue = ∅.

Right — I’m just saying (in comment #1) it doesn’t make a difference and it never will since `ID_Continue` already contains those code points anyway. See the definition of `ID_Continue` here: http://unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers `ID_Continue` is a superset of `ID_Start` (including `Other_ID_Start`), so this is a purely theoretical issue.


(In reply to Mathias Bynens from comment #8)
> (In reply to André Bargull from comment #6)
> > So, if UnicodeIDContinue mentions Other_ID_Continue it should also mention
> > Other_ID_Start, because Other_ID_Start ∩ Other_ID_Continue = ∅.
>
> Right — I’m just saying (in comment #1) it doesn’t make a difference and it
> never will since `ID_Continue` already contains those code points anyway.
> See the definition of `ID_Continue` here:
> http://unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers
> `ID_Continue` is a superset of `ID_Start` (including `Other_ID_Start`), so
> this is a purely theoretical issue.

It's just misleading to leave off Other_ID_Start in UnicodeIDContinue if Other_ID_Continue is explicitly included.


(In reply to Mathias Bynens from comment #7)
> Why should it be used over `DerivedCoreProperties.txt`? To me
> http://www.unicode.org/reports/tr44/#Simple_Derived sounds like
> `DerivedCoreProperties.txt` is the reference for derived core properties
> such as `ID_Start` and `ID_Continue`.

I don't understand that question. :-(

In comment 2 I've copy-pasted the definitions for Other_ID_Start and Other_ID_Continue from PropList.txt.
In comment 3 you've responded to comment 2, and said that DerivedCoreProperties.txt instead of PropList.txt should be used (*).
In comment 5 I've responded to comment 3, and said that Other_ID_Start and Other_ID_Continue are defined in (and only in) PropList.txt.

(*) Most likely this is the point where we started to talk about different subjects. I've continued to say that Other_ID_Start and Other_ID_Continue are defined in PropList.txt. And you were talking about ID_Start and ID_Continue which are defined in DerivedCoreProperties.txt.


(In reply to André Bargull from comment #10)
> (In reply to Mathias Bynens from comment #7)
> > Why should it be used over `DerivedCoreProperties.txt`? To me
> > http://www.unicode.org/reports/tr44/#Simple_Derived sounds like
> > `DerivedCoreProperties.txt` is the reference for derived core properties
> > such as `ID_Start` and `ID_Continue`.
>
> I don't understand that question. :-(
>
> In comment 2 I've copy-pasted the definitions for Other_ID_Start and
> Other_ID_Continue from PropList.txt.
> In comment 3 you've responded to comment 2, and said that
> DerivedCoreProperties.txt instead of PropList.txt should be used (*).
> In comment 5 I've responded to comment 3, and said that Other_ID_Start and
> Other_ID_Continue are defined in (and only in) PropList.txt.
>
> (*) Most likely this is the point where we started to talk about different
> subjects. I've continued to say that Other_ID_Start and Other_ID_Continue
> are defined in PropList.txt. And you were talking about ID_Start and
> ID_Continue which are defined in DerivedCoreProperties.txt.

My point was that neither `Other_ID_Start` or `Other_ID_Continue` are needed if we just use the `ID_Start` and `ID_Continue` listings in `DerivedCoreProperties.txt`, since those include `Other_ID_Start` and `Other_ID_Continue` respectively already.


(In reply to Mathias Bynens from comment #11)
> My point was that neither `Other_ID_Start` or `Other_ID_Continue` are needed
> if we just use the `ID_Start` and `ID_Continue` listings in
> `DerivedCoreProperties.txt`, since those include `Other_ID_Start` and
> `Other_ID_Continue` respectively already.

Yeah sure. Do you agree that removing `Other_ID_Start` and `Other_ID_Continue` from the definitions in UnicodeID{Start, Continue} and instead adding a note is a better (cleaner) way to define the set of allowed identifier characters? Because that should solve the whole redundancy issue.

---
UnicodeIDStart ::
any Unicode code point with the Unicode property “ID_Start”

UnicodeIDContinue ::
any Unicode code point with the Unicode property “ID_Continue”

NOTE: Grandfathered characters defined in “Other_ID_Start” and “Other_ID_Continue” must be recognized/supported by a compliant implementation.
---


(In reply to André Bargull from comment #12)
> (In reply to Mathias Bynens from comment #11)
> > My point was that neither `Other_ID_Start` or `Other_ID_Continue` are needed
> > if we just use the `ID_Start` and `ID_Continue` listings in
> > `DerivedCoreProperties.txt`, since those include `Other_ID_Start` and
> > `Other_ID_Continue` respectively already.
>
> Yeah sure. Do you agree that removing `Other_ID_Start` and
> `Other_ID_Continue` from the definitions in UnicodeID{Start, Continue} and
> instead adding a note is a better (cleaner) way to define the set of allowed
> identifier characters? Because that should solve the whole redundancy issue.
>
> ---
> UnicodeIDStart ::
> any Unicode code point with the Unicode property “ID_Start”
>
> UnicodeIDContinue ::
> any Unicode code point with the Unicode property “ID_Continue”
>
> NOTE: Grandfathered characters defined in “Other_ID_Start” and
> “Other_ID_Continue” must be recognized/supported by a compliant
> implementation.
> ---

That was my intention when filing bug 2717 :) Sounds good.


(In reply to André Bargull from comment #12)
...
>
> ---
> UnicodeIDStart ::
> any Unicode code point with the Unicode property “ID_Start”
>
> UnicodeIDContinue ::
> any Unicode code point with the Unicode property “ID_Continue”
>
> NOTE: Grandfathered characters defined in “Other_ID_Start” and
> “Other_ID_Continue” must be recognized/supported by a compliant
> implementation.
> ---

A NOTE can't express a normative requirement. However, if we agree that the definitions of of ID_Start and ID_continue are normatively sufficient, then we could have a NOTE that says something like:

NOTE The sets of code points with Unicode properties ID_Start and ID_Continue include, respectively, the code points with Unicode properties Other_ID_Start and Other_ID_Continue.


I agree that all normative references to Other_ID_Start and Other_ID_Continue should be removed, and Allen's proposed note should be added.


fixed in rev26 editor's draft


in rev36