It is a Syntax Error if the List of Unicode code points that is SourceText of UnicodePropertyName is not identical to a List of Unicode code points that is a Unicode property name or property alias listed in the “Property name and aliases” column of Table 1.
It is a Syntax Error if the List of Unicode code points that is SourceText of UnicodePropertyValue is not identical to a List of Unicode code points that is a value or value alias for the Unicode property or property alias given by SourceText of UnicodePropertyName listed in the “Property value and aliases” column of the corresponding tables Table 3 or Table 4.

UnicodePropertyValueExpression

LoneUnicodePropertyNameOrValue

It is a Syntax Error if the List of Unicode code points that is SourceText of LoneUnicodePropertyNameOrValue is not identical to a List of Unicode code points that is a Unicode general category or general category alias listed in the “Property value and aliases” column of Table 3, nor a binary property or binary property alias listed in the “Property name and aliases” column of Table 2.

The following two abstract operations are appended to 21.2.2.8 Atom.

2Runtime Semantics: UnicodeMatchProperty ( `p` )

The algorithm uses values from the following tables, which associate supported Unicode property names and property aliases and their canonical property names.

Implementations must support the following non-binary Unicode properties and their property aliases:

Table 1: Non-binary Unicode property aliases and their canonical property names

Property name and aliases	Canonical property name
`General_Category` `gc`	`General_Category`
`Script` `sc`	`Script`
`Script_Extensions` `scx`	`Script_Extensions`

Additionally, implementations must support the following binary Unicode properties and their property aliases:

Table 2: Binary Unicode property aliases and their canonical property names

Property name and aliases	Canonical property name
`ASCII`	`ASCII`
`ASCII_Hex_Digit` `AHex`	`ASCII_Hex_Digit`
`Alphabetic` `Alpha`	`Alphabetic`
`Any`	`Any`
`Assigned`	`Assigned`
`Bidi_Control` `Bidi_C`	`Bidi_Control`
`Bidi_Mirrored` `Bidi_M`	`Bidi_Mirrored`
`Case_Ignorable` `CI`	`Case_Ignorable`
`Cased`	`Cased`
`Changes_When_Casefolded` `CWCF`	`Changes_When_Casefolded`
`Changes_When_Casemapped` `CWCM`	`Changes_When_Casemapped`
`Changes_When_Lowercased` `CWL`	`Changes_When_Lowercased`
`Changes_When_NFKC_Casefolded` `CWKCF`	`Changes_When_NFKC_Casefolded`
`Changes_When_Titlecased` `CWT`	`Changes_When_Titlecased`
`Changes_When_Uppercased` `CWU`	`Changes_When_Uppercased`
`Dash`	`Dash`
`Default_Ignorable_Code_Point` `DI`	`Default_Ignorable_Code_Point`
`Deprecated` `Dep`	`Deprecated`
`Diacritic` `Dia`	`Diacritic`
`Emoji`	`Emoji`
`Emoji_Component`	`Emoji_Component`
`Emoji_Modifier`	`Emoji_Modifier`
`Emoji_Modifier_Base`	`Emoji_Modifier_Base`
`Emoji_Presentation`	`Emoji_Presentation`
`Extender` `Ext`	`Extender`
`Grapheme_Base` `Gr_Base`	`Grapheme_Base`
`Grapheme_Extend` `Gr_Ext`	`Grapheme_Extend`
`Hex_Digit` `Hex`	`Hex_Digit`
`IDS_Binary_Operator` `IDSB`	`IDS_Binary_Operator`
`IDS_Trinary_Operator` `IDST`	`IDS_Trinary_Operator`
`ID_Continue` `IDC`	`ID_Continue`
`ID_Start` `IDS`	`ID_Start`
`Ideographic` `Ideo`	`Ideographic`
`Join_Control` `Join_C`	`Join_Control`
`Logical_Order_Exception` `LOE`	`Logical_Order_Exception`
`Lowercase` `Lower`	`Lowercase`
`Math`	`Math`
`Noncharacter_Code_Point` `NChar`	`Noncharacter_Code_Point`
`Pattern_Syntax` `Pat_Syn`	`Pattern_Syntax`
`Pattern_White_Space` `Pat_WS`	`Pattern_White_Space`
`Quotation_Mark` `QMark`	`Quotation_Mark`
`Radical`	`Radical`
`Regional_Indicator` `RI`	`Regional_Indicator`
`Sentence_Terminal` `STerm`	`Sentence_Terminal`
`Soft_Dotted` `SD`	`Soft_Dotted`
`Terminal_Punctuation` `Term`	`Terminal_Punctuation`
`Unified_Ideograph` `UIdeo`	`Unified_Ideograph`
`Uppercase` `Upper`	`Uppercase`
`Variation_Selector` `VS`	`Variation_Selector`
`White_Space` `space`	`White_Space`
`XID_Continue` `XIDC`	`XID_Continue`
`XID_Start` `XIDS`	`XID_Start`

The abstract operation UnicodeMatchProperty takes a parameter p that is a List of Unicode code points and performs the following steps:

Assert: p is a List of Unicode code points that is identical to a List of Unicode code points that is a Unicode property name or property alias listed in the “Property name and aliases” column of Table 1 or Table 2.
Let p be the canonical property name of p as given in the “Canonical property name” column of the corresponding row.
Return the List of Unicode code points of p.

To ensure interoperability, implementations must not extend Unicode property support to the remaining properties.

Implementations must only recognize the property aliases listed in Table 1 and Table 2.

Implementations must only recognize the property value aliases and canonical property value names listed in Table 3 and Table 4.

Note 1

For example, Script_Extensions (property name) and scx (property alias) are valid, but script_extensions or Scx aren’t.

Note 2

The listed properties form a superset of what UTS18 RL1.2 requires.

3Runtime Semantics: UnicodeMatchPropertyValue ( `p`, `v` )

The algorithm uses values from the following tables, which associate canonical Unicode property names and their supported values and value aliases:

Table 3: Value aliases and canonical values for the Unicode property General_Category

Property value and aliases	Canonical property value
`Cased_Letter` `LC`	`Cased_Letter`
`Close_Punctuation` `Pe`	`Close_Punctuation`
`Connector_Punctuation` `Pc`	`Connector_Punctuation`
`Control` `Cc` `cntrl`	`Control`
`Currency_Symbol` `Sc`	`Currency_Symbol`
`Dash_Punctuation` `Pd`	`Dash_Punctuation`
`Decimal_Number` `Nd` `digit`	`Decimal_Number`
`Enclosing_Mark` `Me`	`Enclosing_Mark`
`Final_Punctuation` `Pf`	`Final_Punctuation`
`Format` `Cf`	`Format`
`Initial_Punctuation` `Pi`	`Initial_Punctuation`
`Letter` `L`	`Letter`
`Letter_Number` `Nl`	`Letter_Number`
`Line_Separator` `Zl`	`Line_Separator`
`Lowercase_Letter` `Ll`	`Lowercase_Letter`
`Mark` `M` `Combining_Mark`	`Mark`
`Math_Symbol` `Sm`	`Math_Symbol`
`Modifier_Letter` `Lm`	`Modifier_Letter`
`Modifier_Symbol` `Sk`	`Modifier_Symbol`
`Nonspacing_Mark` `Mn`	`Nonspacing_Mark`
`Number` `N`	`Number`
`Open_Punctuation` `Ps`	`Open_Punctuation`
`Other` `C`	`Other`
`Other_Letter` `Lo`	`Other_Letter`
`Other_Number` `No`	`Other_Number`
`Other_Punctuation` `Po`	`Other_Punctuation`
`Other_Symbol` `So`	`Other_Symbol`
`Paragraph_Separator` `Zp`	`Paragraph_Separator`
`Private_Use` `Co`	`Private_Use`
`Punctuation` `P` `punct`	`Punctuation`
`Separator` `Z`	`Separator`
`Space_Separator` `Zs`	`Space_Separator`
`Spacing_Mark` `Mc`	`Spacing_Mark`
`Surrogate` `Cs`	`Surrogate`
`Symbol` `S`	`Symbol`
`Titlecase_Letter` `Lt`	`Titlecase_Letter`
`Unassigned` `Cn`	`Unassigned`
`Uppercase_Letter` `Lu`	`Uppercase_Letter`

Table 4: Value aliases and canonical values for the Unicode properties Script and Script_Extensions

Property value and aliases	Canonical property value
`Adlam` `Adlm`	`Adlam`
`Ahom` `Ahom`	`Ahom`
`Anatolian_Hieroglyphs` `Hluw`	`Anatolian_Hieroglyphs`
`Arabic` `Arab`	`Arabic`
`Armenian` `Armn`	`Armenian`
`Avestan` `Avst`	`Avestan`
`Balinese` `Bali`	`Balinese`
`Bamum` `Bamu`	`Bamum`
`Bassa_Vah` `Bass`	`Bassa_Vah`
`Batak` `Batk`	`Batak`
`Bengali` `Beng`	`Bengali`
`Bhaiksuki` `Bhks`	`Bhaiksuki`
`Bopomofo` `Bopo`	`Bopomofo`
`Brahmi` `Brah`	`Brahmi`
`Braille` `Brai`	`Braille`
`Buginese` `Bugi`	`Buginese`
`Buhid` `Buhd`	`Buhid`
`Canadian_Aboriginal` `Cans`	`Canadian_Aboriginal`
`Carian` `Cari`	`Carian`
`Caucasian_Albanian` `Aghb`	`Caucasian_Albanian`
`Chakma` `Cakm`	`Chakma`
`Cham` `Cham`	`Cham`
`Cherokee` `Cher`	`Cherokee`
`Common` `Zyyy`	`Common`
`Coptic` `Copt` `Qaac`	`Coptic`
`Cuneiform` `Xsux`	`Cuneiform`
`Cypriot` `Cprt`	`Cypriot`
`Cyrillic` `Cyrl`	`Cyrillic`
`Deseret` `Dsrt`	`Deseret`
`Devanagari` `Deva`	`Devanagari`
`Duployan` `Dupl`	`Duployan`
`Egyptian_Hieroglyphs` `Egyp`	`Egyptian_Hieroglyphs`
`Elbasan` `Elba`	`Elbasan`
`Ethiopic` `Ethi`	`Ethiopic`
`Georgian` `Geor`	`Georgian`
`Glagolitic` `Glag`	`Glagolitic`
`Gothic` `Goth`	`Gothic`
`Grantha` `Gran`	`Grantha`
`Greek` `Grek`	`Greek`
`Gujarati` `Gujr`	`Gujarati`
`Gurmukhi` `Guru`	`Gurmukhi`
`Han` `Hani`	`Han`
`Hangul` `Hang`	`Hangul`
`Hanunoo` `Hano`	`Hanunoo`
`Hatran` `Hatr`	`Hatran`
`Hebrew` `Hebr`	`Hebrew`
`Hiragana` `Hira`	`Hiragana`
`Imperial_Aramaic` `Armi`	`Imperial_Aramaic`
`Inherited` `Zinh` `Qaai`	`Inherited`
`Inscriptional_Pahlavi` `Phli`	`Inscriptional_Pahlavi`
`Inscriptional_Parthian` `Prti`	`Inscriptional_Parthian`
`Javanese` `Java`	`Javanese`
`Kaithi` `Kthi`	`Kaithi`
`Kannada` `Knda`	`Kannada`
`Katakana` `Kana`	`Katakana`
`Kayah_Li` `Kali`	`Kayah_Li`
`Kharoshthi` `Khar`	`Kharoshthi`
`Khmer` `Khmr`	`Khmer`
`Khojki` `Khoj`	`Khojki`
`Khudawadi` `Sind`	`Khudawadi`
`Lao` `Laoo`	`Lao`
`Latin` `Latn`	`Latin`
`Lepcha` `Lepc`	`Lepcha`
`Limbu` `Limb`	`Limbu`
`Linear_A` `Lina`	`Linear_A`
`Linear_B` `Linb`	`Linear_B`
`Lisu` `Lisu`	`Lisu`
`Lycian` `Lyci`	`Lycian`
`Lydian` `Lydi`	`Lydian`
`Mahajani` `Mahj`	`Mahajani`
`Malayalam` `Mlym`	`Malayalam`
`Mandaic` `Mand`	`Mandaic`
`Manichaean` `Mani`	`Manichaean`
`Marchen` `Marc`	`Marchen`
`Masaram_Gondi` `Gonm`	`Masaram_Gondi`
`Meetei_Mayek` `Mtei`	`Meetei_Mayek`
`Mende_Kikakui` `Mend`	`Mende_Kikakui`
`Meroitic_Cursive` `Merc`	`Meroitic_Cursive`
`Meroitic_Hieroglyphs` `Mero`	`Meroitic_Hieroglyphs`
`Miao` `Plrd`	`Miao`
`Modi` `Modi`	`Modi`
`Mongolian` `Mong`	`Mongolian`
`Mro` `Mroo`	`Mro`
`Multani` `Mult`	`Multani`
`Myanmar` `Mymr`	`Myanmar`
`Nabataean` `Nbat`	`Nabataean`
`New_Tai_Lue` `Talu`	`New_Tai_Lue`
`Newa` `Newa`	`Newa`
`Nko` `Nkoo`	`Nko`
`Nushu` `Nshu`	`Nushu`
`Ogham` `Ogam`	`Ogham`
`Ol_Chiki` `Olck`	`Ol_Chiki`
`Old_Hungarian` `Hung`	`Old_Hungarian`
`Old_Italic` `Ital`	`Old_Italic`
`Old_North_Arabian` `Narb`	`Old_North_Arabian`
`Old_Permic` `Perm`	`Old_Permic`
`Old_Persian` `Xpeo`	`Old_Persian`
`Old_South_Arabian` `Sarb`	`Old_South_Arabian`
`Old_Turkic` `Orkh`	`Old_Turkic`
`Oriya` `Orya`	`Oriya`
`Osage` `Osge`	`Osage`
`Osmanya` `Osma`	`Osmanya`
`Pahawh_Hmong` `Hmng`	`Pahawh_Hmong`
`Palmyrene` `Palm`	`Palmyrene`
`Pau_Cin_Hau` `Pauc`	`Pau_Cin_Hau`
`Phags_Pa` `Phag`	`Phags_Pa`
`Phoenician` `Phnx`	`Phoenician`
`Psalter_Pahlavi` `Phlp`	`Psalter_Pahlavi`
`Rejang` `Rjng`	`Rejang`
`Runic` `Runr`	`Runic`
`Samaritan` `Samr`	`Samaritan`
`Saurashtra` `Saur`	`Saurashtra`
`Sharada` `Shrd`	`Sharada`
`Shavian` `Shaw`	`Shavian`
`Siddham` `Sidd`	`Siddham`
`SignWriting` `Sgnw`	`SignWriting`
`Sinhala` `Sinh`	`Sinhala`
`Sora_Sompeng` `Sora`	`Sora_Sompeng`
`Soyombo` `Soyo`	`Soyombo`
`Sundanese` `Sund`	`Sundanese`
`Syloti_Nagri` `Sylo`	`Syloti_Nagri`
`Syriac` `Syrc`	`Syriac`
`Tagalog` `Tglg`	`Tagalog`
`Tagbanwa` `Tagb`	`Tagbanwa`
`Tai_Le` `Tale`	`Tai_Le`
`Tai_Tham` `Lana`	`Tai_Tham`
`Tai_Viet` `Tavt`	`Tai_Viet`
`Takri` `Takr`	`Takri`
`Tamil` `Taml`	`Tamil`
`Tangut` `Tang`	`Tangut`
`Telugu` `Telu`	`Telugu`
`Thaana` `Thaa`	`Thaana`
`Thai` `Thai`	`Thai`
`Tibetan` `Tibt`	`Tibetan`
`Tifinagh` `Tfng`	`Tifinagh`
`Tirhuta` `Tirh`	`Tirhuta`
`Ugaritic` `Ugar`	`Ugaritic`
`Vai` `Vaii`	`Vai`
`Warang_Citi` `Wara`	`Warang_Citi`
`Yi` `Yiii`	`Yi`
`Zanabazar_Square` `Zanb`	`Zanabazar_Square`

The abstract operation UnicodeMatchPropertyValue takes two parameters p and v, each of which is a List of Unicode code points, and performs the following steps:

Assert: p is a List of Unicode code points that is identical to a List of Unicode code points that is a canonical, unaliased Unicode property name listed in the “Canonical property name” column of Table 1.
Assert: v is a List of Unicode code points that is identical to a List of Unicode code points that is a property value or property value alias for Unicode property p listed in the “Property value and aliases” column of Table 3 or Table 4.
Let value be the canonical property value of v as given in the “Canonical property value” column of the corresponding row.
Return the List of Unicode code points of value.

Only the canonical property values and property value aliases listed in Table 3 and Table 4 must be recognized.

Note 1

For example, Xpeo and Old_Persian are valid Script_Extension values, but xpeo and Old Persian aren’t.

Note 2

This algorithm differs from the matching rules for symbolic values listed in UAX44: case, white space, U+002D (HYPHEN-MINUS), and U+005F (LOW LINE) are not ignored, and the Is prefix is not supported.

The following is appended to the list of productions in 21.2.2.12 CharacterClassEscape.

The production CharacterClassEscape::\p{UnicodePropertyValueExpression} evaluates by returning the CharSet containing all Unicode code points included in the CharSet returned by UnicodePropertyValueExpression.

The production CharacterClassEscape::\P{UnicodePropertyValueExpression} evaluates by returning the CharSet containing all Unicode code points not included in the CharSet returned by UnicodePropertyValueExpression.

The production UnicodePropertyValueExpression::UnicodePropertyName=UnicodePropertyValue evaluates as follows:

Let p be ! UnicodeMatchProperty(UnicodePropertyName).
Assert: p is a Unicode property name or property alias listed in the “Property name and aliases” column of Table 1.
Let v be ! UnicodeMatchPropertyValue(p, UnicodePropertyValue).
Return the CharSet containing all Unicode code points whose character database definition includes the property p with value v.

The production UnicodePropertyValueExpression::LoneUnicodePropertyNameOrValue evaluates as follows:

If ! UnicodeMatchPropertyValue("General_Category", LoneUnicodePropertyNameOrValue) is identical to a List of Unicode code points that is the name of a Unicode general category or general category alias listed in the “Property value and aliases” column of Table 3, then
1. Return the CharSet containing all Unicode code points whose character database definition includes the property General_Category with value LoneUnicodePropertyNameOrValue.
Let p be ! UnicodeMatchProperty(LoneUnicodePropertyNameOrValue).
Assert: p is a binary Unicode property or binary property alias listed in the “Property name and aliases” column of Table 2.
Return the CharSet containing all Unicode code points whose character database definition includes the property p with value True.

The following is appended to the bibliography.

ABibliography

Unicode Standard Annex #18: Unicode Regular Expressions, available at <https://unicode.org/reports/tr18/>
Unicode Standard Annex #24: Unicode Script Property, available at <https://unicode.org/reports/tr24/>
Unicode Standard Annex #44: Unicode Character Database, available at <https://unicode.org/reports/tr44/>
Unicode Technical Report #51: Unicode Emoji, available at <https://unicode.org/reports/tr51/>

Stage 4 Draft / January 24, 2018

Unicode property escapes in regular expressions

1Static Semantics: SourceText

2Runtime Semantics: UnicodeMatchProperty ( p )

3Runtime Semantics: UnicodeMatchPropertyValue ( p, v )

ABibliography

2Runtime Semantics: UnicodeMatchProperty ( `p` )

3Runtime Semantics: UnicodeMatchPropertyValue ( `p`, `v` )