#4148 — 11.6.1.1 Static Semantics: Early Errors

bug_id: 4148
creation_ts: 2015-03-09 07:44:00 -0700
short_desc: 11.6.1.1 Static Semantics: Early Errors
delta_ts: 2015-03-17 16:57:06 -0700
product: Draft for 6th Edition
component: technical issue
version: Rev 35: March 4, 2015 Release Candidate 2
rep_platform: All
op_sys: All
bug_status: RESOLVED
resolution: FIXED
priority: Normal
bug_severity: enhancement
everconfirmed: true
reporter: Mathias Bynens
assigned_to: Allen Wirfs-Brock
cc: ["andrebargull", "ecmascriptbugs", "mathias"]

commentid: 13658
comment_count: 0
who: Mathias Bynens
bug_when: 2015-03-09 07:44:34 -0700

> IdentifierStart :: \ UnicodeEscapeSequence
>
> It is a Syntax Error if SV(UnicodeEscapeSequence) is neither the UTF16Encoding (10.1.1) of a single Unicode code point with the Unicode property “ID_Start” nor "$" or "_".

`Other_ID_Start` is missing.

> IdentifierPart :: \ UnicodeEscapeSequence
>
> It is a Syntax Error if SV(UnicodeEscapeSequence) is neither the UTF16Encoding (10.1.1) of a single Unicode code point with the Unicode property “ID_Continue” nor "$" or "_" nor the UTF16Encoding of either <ZWNJ> or <ZWJ>.

`Other_ID_Continue` and `Other_ID_Start` is missing.

Section 11.6 does mention those in the definitions of `UnicodeIDStart` and `UnicodeIDContinue` so either this section should too, or such mentions should be removed everywhere.

commentid: 13659
comment_count: 1
who: Mathias Bynens
bug_when: 2015-03-09 07:46:41 -0700

For the record, it seems explicitly mentioning `Other_ID_Start` in the definition of `UnicodeIDContinue` is redundant.

commentid: 13660
comment_count: 2
who: André Bargull
bug_when: 2015-03-09 08:25:43 -0700

(In reply to Mathias Bynens from comment #1)
> For the record, it seems explicitly mentioning `Other_ID_Start` in the
> definition of `UnicodeIDContinue` is redundant.

No, see bug 3027.

Unicode 7.0, PropList.txt:

# ================================================

2118 ; Other_ID_Start # Sm SCRIPT CAPITAL P
212E ; Other_ID_Start # So ESTIMATED SYMBOL
309B..309C ; Other_ID_Start # Sk [2] KATAKANA-HIRAGANA VOICED SOUND MARK..KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK

# Total code points: 4

# ================================================

00B7 ; Other_ID_Continue # Po MIDDLE DOT
0387 ; Other_ID_Continue # Po GREEK ANO TELEIA
1369..1371 ; Other_ID_Continue # No [9] ETHIOPIC DIGIT ONE..ETHIOPIC DIGIT NINE
19DA ; Other_ID_Continue # No NEW TAI LUE THAM DIGIT ONE

# Total code points: 12

# ================================================

commentid: 13661
comment_count: 3
who: Allen Wirfs-Brock
bug_when: 2015-03-09 08:59:48 -0700

fixed in rev36 editor's draft

the early error problem,

commentid: 13662
comment_count: 4
who: Mathias Bynens
bug_when: 2015-03-09 09:28:41 -0700

(In reply to André Bargull from comment #2)
> Unicode 7.0, PropList.txt:
>
> # ================================================
>
> 2118 ; Other_ID_Start # Sm SCRIPT CAPITAL P
> 212E ; Other_ID_Start # So ESTIMATED SYMBOL
> 309B..309C ; Other_ID_Start # Sk [2] KATAKANA-HIRAGANA VOICED SOUND
> MARK..KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
>
> # Total code points: 4
>
> # ================================================
>
> 00B7 ; Other_ID_Continue # Po MIDDLE DOT
> 0387 ; Other_ID_Continue # Po GREEK ANO TELEIA
> 1369..1371 ; Other_ID_Continue # No [9] ETHIOPIC DIGIT ONE..ETHIOPIC
> DIGIT NINE
> 19DA ; Other_ID_Continue # No NEW TAI LUE THAM DIGIT ONE
>
> # Total code points: 12
>
> # ================================================

Exactly — All those `Other_ID_Start` code points are already in `ID_Continue` anyway.

To quickly confirm that explicitly including `Other_ID_Start` for `UnicodeIDContinue` doesn’t make a difference, use a script like https://gist.github.com/mathiasbynens/6334847. See the latest revision on https://gist.github.com/mathiasbynens/6334847 which didn’t cause any changes whatsoever in the generated output.

Btw, rather than `PropList.txt` it seems `DerivedCoreProperties.txt` should be used as per http://unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers.

commentid: 13663
comment_count: 5
who: André Bargull
bug_when: 2015-03-09 09:49:14 -0700

(In reply to Mathias Bynens from comment #4)
> Exactly — All those `Other_ID_Start` code points are already in
> `ID_Continue` anyway.

Err, in bug 2717 you've requested to explicitly state that Other_ID_Start and Other_ID_Continue are included.

> Btw, rather than `PropList.txt` it seems `DerivedCoreProperties.txt` should
> be used as per
> http://unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers.

Other_ID_Start and Other_ID_Continue are defined in PropList.txt.

commentid: 13664
comment_count: 6
who: André Bargull
bug_when: 2015-03-09 10:00:02 -0700

So, if UnicodeIDContinue mentions Other_ID_Continue it should also mention Other_ID_Start, because Other_ID_Start ∩ Other_ID_Continue = ∅.

commentid: 13665
comment_count: 7
who: Mathias Bynens
bug_when: 2015-03-09 10:03:02 -0700

(In reply to André Bargull from comment #5)
> (In reply to Mathias Bynens from comment #4)
> > Exactly — All those `Other_ID_Start` code points are already in
> > `ID_Continue` anyway.
>
> Err, in bug 2717 you've requested to explicitly state that Other_ID_Start
> and Other_ID_Continue are included.

I asked to explicitly include `Other_ID_Start` alongside `ID_Start` and `Other_ID_Continue` alongside `ID_Continue` (just to clarify/repeat what UAX #31 says). How is this relevant to this discussion, though? IIUC this is a different issue, about adding `Other_ID_Start` to `UnicodeIDContinue`.

> Other_ID_Start and Other_ID_Continue are defined in PropList.txt.

Why should it be used over `DerivedCoreProperties.txt`? To me http://www.unicode.org/reports/tr44/#Simple_Derived sounds like `DerivedCoreProperties.txt` is the reference for derived core properties such as `ID_Start` and `ID_Continue`.

commentid: 13667
comment_count: 8
who: Mathias Bynens
bug_when: 2015-03-09 10:11:48 -0700

(In reply to André Bargull from comment #6)
> So, if UnicodeIDContinue mentions Other_ID_Continue it should also mention
> Other_ID_Start, because Other_ID_Start ∩ Other_ID_Continue = ∅.

Right — I’m just saying (in comment #1) it doesn’t make a difference and it never will since `ID_Continue` already contains those code points anyway. See the definition of `ID_Continue` here: http://unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers `ID_Continue` is a superset of `ID_Start` (including `Other_ID_Start`), so this is a purely theoretical issue.

commentid: 13668
comment_count: 9
who: André Bargull
bug_when: 2015-03-09 10:17:16 -0700

(In reply to Mathias Bynens from comment #8)
> (In reply to André Bargull from comment #6)
> > So, if UnicodeIDContinue mentions Other_ID_Continue it should also mention
> > Other_ID_Start, because Other_ID_Start ∩ Other_ID_Continue = ∅.
>
> Right — I’m just saying (in comment #1) it doesn’t make a difference and it
> never will since `ID_Continue` already contains those code points anyway.
> See the definition of `ID_Continue` here:
> http://unicode.org/reports/tr31/#Table_Lexical_Classes_for_Identifiers
> `ID_Continue` is a superset of `ID_Start` (including `Other_ID_Start`), so
> this is a purely theoretical issue.

It's just misleading to leave off Other_ID_Start in UnicodeIDContinue if Other_ID_Continue is explicitly included.

commentid: 13669
comment_count: 10
who: André Bargull
bug_when: 2015-03-09 10:26:42 -0700

(In reply to Mathias Bynens from comment #7)
> Why should it be used over `DerivedCoreProperties.txt`? To me
> http://www.unicode.org/reports/tr44/#Simple_Derived sounds like
> `DerivedCoreProperties.txt` is the reference for derived core properties
> such as `ID_Start` and `ID_Continue`.

I don't understand that question. :-(

In comment 2 I've copy-pasted the definitions for Other_ID_Start and Other_ID_Continue from PropList.txt.
In comment 3 you've responded to comment 2, and said that DerivedCoreProperties.txt instead of PropList.txt should be used (*).
In comment 5 I've responded to comment 3, and said that Other_ID_Start and Other_ID_Continue are defined in (and only in) PropList.txt.

(*) Most likely this is the point where we started to talk about different subjects. I've continued to say that Other_ID_Start and Other_ID_Continue are defined in PropList.txt. And you were talking about ID_Start and ID_Continue which are defined in DerivedCoreProperties.txt.

commentid: 13673
comment_count: 11
who: Mathias Bynens
bug_when: 2015-03-09 14:38:18 -0700

(In reply to André Bargull from comment #10)
> (In reply to Mathias Bynens from comment #7)
> > Why should it be used over `DerivedCoreProperties.txt`? To me
> > http://www.unicode.org/reports/tr44/#Simple_Derived sounds like
> > `DerivedCoreProperties.txt` is the reference for derived core properties
> > such as `ID_Start` and `ID_Continue`.
>
> I don't understand that question. :-(
>
> In comment 2 I've copy-pasted the definitions for Other_ID_Start and
> Other_ID_Continue from PropList.txt.
> In comment 3 you've responded to comment 2, and said that
> DerivedCoreProperties.txt instead of PropList.txt should be used (*).
> In comment 5 I've responded to comment 3, and said that Other_ID_Start and
> Other_ID_Continue are defined in (and only in) PropList.txt.
>
> (*) Most likely this is the point where we started to talk about different
> subjects. I've continued to say that Other_ID_Start and Other_ID_Continue
> are defined in PropList.txt. And you were talking about ID_Start and
> ID_Continue which are defined in DerivedCoreProperties.txt.

My point was that neither `Other_ID_Start` or `Other_ID_Continue` are needed if we just use the `ID_Start` and `ID_Continue` listings in `DerivedCoreProperties.txt`, since those include `Other_ID_Start` and `Other_ID_Continue` respectively already.

commentid: 13701
comment_count: 12
who: André Bargull
bug_when: 2015-03-11 17:14:46 -0700

(In reply to Mathias Bynens from comment #11)
> My point was that neither `Other_ID_Start` or `Other_ID_Continue` are needed
> if we just use the `ID_Start` and `ID_Continue` listings in
> `DerivedCoreProperties.txt`, since those include `Other_ID_Start` and
> `Other_ID_Continue` respectively already.

Yeah sure. Do you agree that removing `Other_ID_Start` and `Other_ID_Continue` from the definitions in UnicodeID{Start, Continue} and instead adding a note is a better (cleaner) way to define the set of allowed identifier characters? Because that should solve the whole redundancy issue.

---
UnicodeIDStart ::
any Unicode code point with the Unicode property “ID_Start”

UnicodeIDContinue ::
any Unicode code point with the Unicode property “ID_Continue”

NOTE: Grandfathered characters defined in “Other_ID_Start” and “Other_ID_Continue” must be recognized/supported by a compliant implementation.
---

commentid: 13705
comment_count: 13
who: Mathias Bynens
bug_when: 2015-03-12 02:58:52 -0700

(In reply to André Bargull from comment #12)
> (In reply to Mathias Bynens from comment #11)
> > My point was that neither `Other_ID_Start` or `Other_ID_Continue` are needed
> > if we just use the `ID_Start` and `ID_Continue` listings in
> > `DerivedCoreProperties.txt`, since those include `Other_ID_Start` and
> > `Other_ID_Continue` respectively already.
>
> Yeah sure. Do you agree that removing `Other_ID_Start` and
> `Other_ID_Continue` from the definitions in UnicodeID{Start, Continue} and
> instead adding a note is a better (cleaner) way to define the set of allowed
> identifier characters? Because that should solve the whole redundancy issue.
>
> ---
> UnicodeIDStart ::
> any Unicode code point with the Unicode property “ID_Start”
>
> UnicodeIDContinue ::
> any Unicode code point with the Unicode property “ID_Continue”
>
> NOTE: Grandfathered characters defined in “Other_ID_Start” and
> “Other_ID_Continue” must be recognized/supported by a compliant
> implementation.
> ---

That was my intention when filing bug 2717 :) Sounds good.

commentid: 13707
comment_count: 14
who: Allen Wirfs-Brock
bug_when: 2015-03-12 09:55:21 -0700

(In reply to André Bargull from comment #12)
...
>
> ---
> UnicodeIDStart ::
> any Unicode code point with the Unicode property “ID_Start”
>
> UnicodeIDContinue ::
> any Unicode code point with the Unicode property “ID_Continue”
>
> NOTE: Grandfathered characters defined in “Other_ID_Start” and
> “Other_ID_Continue” must be recognized/supported by a compliant
> implementation.
> ---

A NOTE can't express a normative requirement. However, if we agree that the definitions of of ID_Start and ID_continue are normatively sufficient, then we could have a NOTE that says something like:

NOTE The sets of code points with Unicode properties ID_Start and ID_Continue include, respectively, the code points with Unicode properties Other_ID_Start and Other_ID_Continue.

commentid: 13785
comment_count: 15
who: Norbert
bug_when: 2015-03-16 16:06:23 -0700

I agree that all normative references to Other_ID_Start and Other_ID_Continue should be removed, and Allen's proposed note should be added.

commentid: 13786
comment_count: 16
who: Allen Wirfs-Brock
bug_when: 2015-03-16 16:29:25 -0700

fixed in rev26 editor's draft

commentid: 13825
comment_count: 17
who: Allen Wirfs-Brock
bug_when: 2015-03-17 16:57:06 -0700

in rev36

archives

#4148 — 11.6.1.1 Static Semantics: Early Errors