Stage 3 Draft / March 22, 2017

RegExp Lookbehind Assertions

1Introduction

Lookarounds are zero-width assertions that match a string without consuming anything. ECMAScript has lookahead assertions that does this in forward direction, but the language is missing a way to do this backward which the lookbehind assertions provide. With lookbehind assertions, one can make sure that a pattern is or isn't preceded by another, e.g. matching a dollar amount without capturing the dollar sign.

See the proposal repository for background material and discussion.

2Pattern Syntax

Pattern[U]::Disjunction[?U] Disjunction[U]::Alternative[?U] Alternative[?U]|Disjunction[?U] Alternative[U]::[empty] Alternative[?U]Term[?U] Term[U]::Assertion[?U] Atom[?U] Atom[?U]Quantifier Assertion[U]::^ $ \b \B (?=Disjunction[?U]) (?!Disjunction[?U]) (?<=Disjunction[?U]) (?<!Disjunction[?U]) Quantifier::QuantifierPrefix QuantifierPrefix? QuantifierPrefix::* + ? {DecimalDigits} {DecimalDigits,} {DecimalDigits,DecimalDigits} Atom[U]::PatternCharacter . \AtomEscape[?U] CharacterClass[?U] (Disjunction[?U]) (?:Disjunction[?U]) SyntaxCharacter::one of^$\.*+?()[]{}| PatternCharacter::SourceCharacterbut not SyntaxCharacter AtomEscape[U]::DecimalEscape CharacterClassEscape CharacterEscape[?U] CharacterEscape[U]::ControlEscape cControlLetter 0[lookahead ∉ DecimalDigit] HexEscapeSequence RegExpUnicodeEscapeSequence[?U] IdentityEscape[?U] ControlEscape::one offnrtv ControlLetter::one ofabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ RegExpUnicodeEscapeSequence[U]::[+U]uLeadSurrogate\uTrailSurrogate [+U]uLeadSurrogate [+U]uTrailSurrogate [+U]uNonSurrogate [~U]uHex4Digits [+U]u{HexDigits}

Each \u TrailSurrogate for which the choice of associated u LeadSurrogate is ambiguous shall be associated with the nearest possible u LeadSurrogate that would otherwise have no corresponding \u TrailSurrogate.

LeadSurrogate::Hex4Digitsbut only if the SV of Hex4Digits is in the inclusive range 0xD800 to 0xDBFF TrailSurrogate::Hex4Digitsbut only if the SV of Hex4Digits is in the inclusive range 0xDC00 to 0xDFFF NonSurrogate::Hex4Digitsbut only if the SV of Hex4Digits is not in the inclusive range 0xD800 to 0xDFFF IdentityEscape[U]::[+U]SyntaxCharacter [+U]/ [~U]SourceCharacterbut not UnicodeIDContinue DecimalEscape::NonZeroDigitDecimalDigitsopt[lookahead ∉ DecimalDigit] CharacterClassEscape::one ofdDsSwW CharacterClass[U]::[[lookahead ∉ { ^ }]ClassRanges[?U]] [^ClassRanges[?U]] ClassRanges[U]::[empty] NonemptyClassRanges[?U] NonemptyClassRanges[U]::ClassAtom[?U] ClassAtom[?U]NonemptyClassRangesNoDash[?U] ClassAtom[?U]-ClassAtom[?U]ClassRanges[?U] NonemptyClassRangesNoDash[U]::ClassAtom[?U] ClassAtomNoDash[?U]NonemptyClassRangesNoDash[?U] ClassAtomNoDash[?U]-ClassAtom[?U]ClassRanges[?U] ClassAtom[U]::- ClassAtomNoDash[?U] ClassAtomNoDash[U]::SourceCharacterbut not one of \ or ] or - \ClassEscape[?U] ClassEscape[U]::b [+U]- CharacterClassEscape CharacterEscape[?U]

3Pattern Semantics

3.1Pattern

The production Pattern::Disjunction evaluates as follows:

  1. Evaluate Disjunction with +1 as its direction argument to obtain a Matcher m.
  2. Return an internal closure that takes two arguments, a String str and an integer index, and performs the following steps:
    1. Assert: index ≤ the number of elements in str.
    2. If Unicode is true, let Input be a List consisting of the sequence of code points of str interpreted as a UTF-16 encoded (6.1.4) Unicode string. Otherwise, let Input be a List consisting of the sequence of code units that are the elements of str. Input will be used throughout the algorithms in 21.2.2. Each element of Input is considered to be a character.
    3. Let InputLength be the number of characters contained in Input. This variable will be used throughout the algorithms in 21.2.2.
    4. Let listIndex be the index into Input of the character that was obtained from element index of str.
    5. Let c be a Continuation that always returns its State argument as a successful MatchResult.
    6. Let cap be a List of NcapturingParens undefined values, indexed 1 through NcapturingParens.
    7. Let x be the State (listIndex, cap).
    8. Call m(x, c) and return its result.
Note

A Pattern evaluates (“compiles”) to an internal procedure value. RegExpBuiltinExec can then apply this procedure to a String and an offset within the String to determine whether the pattern would match starting at exactly that offset within the String, and, if it does match, what the values of the capturing parentheses would be. The algorithms in 21.2.2 are designed so that compiling a pattern may throw a SyntaxError exception; on the other hand, once the pattern is successfully compiled, applying the resulting internal procedure to find a match in a String cannot throw an exception (except for any host-defined exceptions that can occur anywhere such as out-of-memory).

3.2Disjunction

With argument direction.

The production Disjunction::Alternative evaluates by evaluating Alternative to obtain a Matcher and returning that Matcher.

The production Disjunction::Alternative|Disjunction evaluates as follows:

  1. Evaluate Alternative with argument direction to obtain a Matcher m1.
  2. Evaluate Disjunction with argument direction to obtain a Matcher m2.
  3. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
    1. Call m1(x, c) and let r be its result.
    2. If r is not failure, return r.
    3. Call m2(x, c) and return its result.
Note

The | regular expression operator separates two alternatives. The pattern first tries to match the left Alternative (followed by the sequel of the regular expression); if it fails, it tries to match the right Disjunction (followed by the sequel of the regular expression). If the left Alternative, the right Disjunction, and the sequel all have choice points, all choices in the sequel are tried before moving on to the next choice in the left Alternative. If choices in the left Alternative are exhausted, the right Disjunction is tried instead of the left Alternative. Any capturing parentheses inside a portion of the pattern skipped by | produce undefined values instead of Strings. Thus, for example,

/a|ab/.exec("abc")

returns the result "a" and not "ab". Moreover,

/((a)|(ab))((c)|(bc))/.exec("abc")

returns the array

["abc", "a", "a", undefined, "bc", undefined, "bc"]

and not

["abc", "ab", undefined, "ab", "c", "c", undefined]

The order in which the two alternatives are tried is independent of the value of direction.

3.3Alternative

With argument direction.

The production Alternative::[empty] evaluates by returning a Matcher that takes two arguments, a State x and a Continuation c, and returns the result of calling c(x).

The production Alternative::AlternativeTerm evaluates as follows:

  1. Evaluate Alternative with argument direction to obtain a Matcher m1.
  2. Evaluate Term with argument direction to obtain a Matcher m2.
  3. If direction is equal to +1, then,
    1. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
      1. Let d be a Continuation that takes a State argument y and returns the result of calling m2(y, c).
      2. Call m1(x, d) and return its result.
  4. Else, direction is equal to -1.
    1. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
      1. Let d be a Continuation that takes a State argument y and returns the result of calling m1(y, c).
      2. Call m2(x, d) and return its result.
Note

Consecutive Terms try to simultaneously match consecutive portions of Input. When direction is equal to +1, if the left Alternative, the right Term, and the sequel of the regular expression all have choice points, all choices in the sequel are tried before moving on to the next choice in the right Term, and all choices in the right Term are tried before moving on to the next choice in the left Alternative.

When direction is equal to -1, the evaluation order of Alternative and Term are reversed.

3.4Term

With argument direction.

The production Term::Assertion evaluates by returning an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:

  1. Evaluate Assertion to obtain an AssertionTester t.
  2. Call t(x) and let r be the resulting Boolean value.
  3. If r is false, return failure.
  4. Call c(x) and return its result.
Note

The AssertionTester is independent of direction.

The production Term::Atom evaluates as follows:

  1. Return the Matcher that is the result of evaluating Atom with argument direction.

The production Term::AtomQuantifier evaluates as follows:

  1. Evaluate Atom with argument direction to obtain a Matcher m.
  2. Evaluate Quantifier to obtain the three results: an integer min, an integer (or ∞) max, and Boolean greedy.
  3. If max is finite and less than min, throw a SyntaxError exception.
  4. Let parenIndex be the number of left capturing parentheses in the entire regular expression that occur to the left of this production expansion's Term. This is the total number of times the Atom::(Disjunction) production is expanded prior to this production's Term plus the total number of Atom::(Disjunction) productions enclosing this Term.
  5. Let parenCount be the number of left capturing parentheses in the expansion of this production's Atom. This is the total number of Atom::(Disjunction) productions enclosed by this production's Atom.
  6. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
    1. Call RepeatMatcher(m, min, max, greedy, x, c, parenIndex, parenCount) and return its result.

3.4.1Runtime Semantics: RepeatMatcher Abstract Operation

The abstract operation RepeatMatcher takes eight parameters, a Matcher m, an integer min, an integer (or ∞) max, a Boolean greedy, a State x, a Continuation c, an integer parenIndex, and an integer parenCount, and performs the following steps:

  1. If max is zero, return c(x).
  2. Let d be an internal Continuation closure that takes one State argument y and performs the following steps when evaluated:
    1. If min is zero and y's endIndex is equal to x's endIndex, return failure.
    2. If min is zero, let min2 be zero; otherwise let min2 be min-1.
    3. If max is ∞, let max2 be ∞; otherwise let max2 be max-1.
    4. Call RepeatMatcher(m, min2, max2, greedy, y, c, parenIndex, parenCount) and return its result.
  3. Let cap be a fresh copy of x's captures List.
  4. For each integer k that satisfies parenIndex < k and kparenIndex+parenCount, set cap[k] to undefined.
  5. Let e be x's endIndex.
  6. Let xr be the State (e, cap).
  7. If min is not zero, return m(xr, d).
  8. If greedy is false, then
    1. Call c(x) and let z be its result.
    2. If z is not failure, return z.
    3. Call m(xr, d) and return its result.
  9. Call m(xr, d) and let z be its result.
  10. If z is not failure, return z.
  11. Call c(x) and return its result.
Note 1

An Atom followed by a Quantifier is repeated the number of times specified by the Quantifier. A Quantifier can be non-greedy, in which case the Atom pattern is repeated as few times as possible while still matching the sequel, or it can be greedy, in which case the Atom pattern is repeated as many times as possible while still matching the sequel. The Atom pattern is repeated rather than the input character sequence that it matches, so different repetitions of the Atom can match different input substrings.

Note 2

If the Atom and the sequel of the regular expression all have choice points, the Atom is first matched as many (or as few, if non-greedy) times as possible. All choices in the sequel are tried before moving on to the next choice in the last repetition of Atom. All choices in the last (nth) repetition of Atom are tried before moving on to the next choice in the next-to-last (n-1)st repetition of Atom; at which point it may turn out that more or fewer repetitions of Atom are now possible; these are exhausted (again, starting with either as few or as many as possible) before moving on to the next choice in the (n-1)st repetition of Atom and so on.

Compare

/a[a-z]{2,4}/.exec("abcdefghi")

which returns "abcde" with

/a[a-z]{2,4}?/.exec("abcdefghi")

which returns "abc".

Consider also

/(aa|aabaac|ba|b|c)*/.exec("aabaac")

which, by the choice point ordering above, returns the array

["aaba", "ba"]

and not any of:

["aabaac", "aabaac"]
["aabaac", "c"]

The above ordering of choice points can be used to write a regular expression that calculates the greatest common divisor of two numbers (represented in unary notation). The following example calculates the gcd of 10 and 15:

"aaaaaaaaaa,aaaaaaaaaaaaaaa".replace(/^(a+)\1*,\1+$/,"$1")

which returns the gcd in unary notation "aaaaa".

Note 3

Step 4 of the RepeatMatcher clears Atom's captures each time Atom is repeated. We can see its behaviour in the regular expression

/(z)((a+)?(b+)?(c))*/.exec("zaacbbbcac")

which returns the array

["zaacbbbcac", "z", "ac", "a", undefined, "c"]

and not

["zaacbbbcac", "z", "ac", "a", "bbb", "c"]

because each iteration of the outermost * clears all captured Strings contained in the quantified Atom, which in this case includes capture Strings numbered 2, 3, 4, and 5.

Note 4

Step 1 of the RepeatMatcher's d closure states that, once the minimum number of repetitions has been satisfied, any more expansions of Atom that match the empty character sequence are not considered for further repetitions. This prevents the regular expression engine from falling into an infinite loop on patterns such as:

/(a*)*/.exec("b")

or the slightly more complicated:

/(a*)b\1+/.exec("baaaac")

which returns the array

["b", ""]

3.5Assertion

The production Assertion::^ evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following steps when evaluated:

  1. Let e be x's endIndex.
  2. If e is zero, return true.
  3. If Multiline is false, return false.
  4. If the character Input[e-1] is one of LineTerminator, return true.
  5. Return false.
Note

Even when the y flag is used with a pattern, ^ always matches only at the beginning of Input, or (if Multiline is true) at the beginning of a line.

The production Assertion::$ evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following steps when evaluated:

  1. Let e be x's endIndex.
  2. If e is equal to InputLength, return true.
  3. If Multiline is false, return false.
  4. If the character Input[e] is one of LineTerminator, return true.
  5. Return false.

The production Assertion::\b evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following steps when evaluated:

  1. Let e be x's endIndex.
  2. Call IsWordChar(e-1) and let a be the Boolean result.
  3. Call IsWordChar(e) and let b be the Boolean result.
  4. If a is true and b is false, return true.
  5. If a is false and b is true, return true.
  6. Return false.

The production Assertion::\B evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following steps when evaluated:

  1. Let e be x's endIndex.
  2. Call IsWordChar(e-1) and let a be the Boolean result.
  3. Call IsWordChar(e) and let b be the Boolean result.
  4. If a is true and b is false, return false.
  5. If a is false and b is true, return false.
  6. Return true.

The production Assertion::(?=Disjunction) evaluates as follows:

  1. Evaluate Disjunction with +1 as its direction argument to obtain a Matcher m.
  2. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
    1. Let d be a Continuation that always returns its State argument as a successful MatchResult.
    2. Call m(x, d) and let r be its result.
    3. If r is failure, return failure.
    4. Let y be r's State.
    5. Let cap be y's captures List.
    6. Let xe be x's endIndex.
    7. Let z be the State (xe, cap).
    8. Call c(z) and return its result.

The production Assertion::(?!Disjunction) evaluates as follows:

  1. Evaluate Disjunction with +1 as its direction argument to obtain a Matcher m.
  2. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
    1. Let d be a Continuation that always returns its State argument as a successful MatchResult.
    2. Call m(x, d) and let r be its result.
    3. If r is not failure, return failure.
    4. Call c(x) and return its result.

The production Assertion::(?<=Disjunction) evaluates as follows:

  1. Evaluate Disjunction with -1 as its direction argument to obtain a Matcher m.
  2. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
    1. Let d be a Continuation that always returns its State argument as a successful MatchResult.
    2. Call m(x, d) and let r be its result.
    3. If r is failure, return failure.
    4. Let y be r's State.
    5. Let cap be y's captures List.
    6. Let xe be x's endIndex.
    7. Let z be the State (xe, cap).
    8. Call c(z) and return its result.

The production Assertion::(?<!Disjunction) evaluates as follows:

  1. Evaluate Disjunction with -1 as its direction argument to obtain a Matcher m.
  2. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
    1. Let d be a Continuation that always returns its State argument as a successful MatchResult.
    2. Call m(x, d) and let r be its result.
    3. If r is not failure, return failure.
    4. Call c(x) and return its result.

3.5.1Runtime Semantics: WordCharacters Abstract Operation

The abstract operation WordCharacters performs the following steps:

  1. Let A be a set of characters containing the sixty-three characters:
    a b c d e f g h i j k l m n o p q r s t u v w x y z
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    0 1 2 3 4 5 6 7 8 9 _
  2. Let U be an empty set.
  3. For each character c not in set A where Canonicalize(c) is in A, add c to U.
  4. Assert: Unless Unicode and IgnoreCase are both true, U is empty.
  5. Add the characters in set U to set A.
  6. Return A.

3.5.2Runtime Semantics: IsWordChar Abstract Operation

The abstract operation IsWordChar takes an integer parameter e and performs the following steps:

  1. If e is -1 or e is InputLength, return false.
  2. Let c be the character Input[e].
  3. Let wordChars be the result of ! WordCharacters().
  4. If c is in wordChars, return true.
  5. Return false.

3.6Atom

With argument direction.

The production Atom::PatternCharacter evaluates as follows:

  1. Let ch be the character matched by PatternCharacter.
  2. Let A be a one-element CharSet containing the character ch.
  3. Call CharacterSetMatcher(A, false, direction) and return its Matcher result.

The production Atom::. evaluates as follows:

  1. Let A be the set of all characters except LineTerminator.
  2. Call CharacterSetMatcher(A, false, direction) and return its Matcher result.

The production Atom::\AtomEscape evaluates as follows:

  1. Return the Matcher that is the result of evaluating AtomEscape with argument direction.

The production Atom::CharacterClass evaluates as follows:

  1. Evaluate CharacterClass to obtain a CharSet A and a Boolean invert.
  2. Call CharacterSetMatcher(A, invert, direction) and return its Matcher result.

The production Atom::(Disjunction) evaluates as follows:

  1. Evaluate Disjunction with argument direction to obtain a Matcher m.
  2. Let parenIndex be the number of left capturing parentheses in the entire regular expression that occur to the left of this production expansion's initial left parenthesis. This is the total number of times the Atom::(Disjunction) production is expanded prior to this production's Atom plus the total number of Atom::(Disjunction) productions enclosing this Atom.
  3. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
    1. Let d be an internal Continuation closure that takes one State argument y and performs the following steps:
      1. Let cap be a fresh copy of y's captures List.
      2. Let xe be x's endIndex.
      3. Let ye be y's endIndex.
      4. If direction is equal +1, then
        1. Let s be a fresh List whose characters are the characters of Input at indices xe (inclusive) through ye (exclusive).
      5. Else, direction is equal to -1.
        1. Let s be a fresh List whose characters are the characters of Input at indices ye (inclusive) through xe (exclusive).
      6. Set cap[parenIndex+1] to s.
      7. Let z be the State (ye, cap).
      8. Call c(z) and return its result.
    2. Call m(x, d) and return its result.

The production Atom::(?:Disjunction) evaluates as follows:

  1. Return the Matcher that is the result of evaluating Disjunction with argument direction.

3.6.1Runtime Semantics: CharacterSetMatcher Abstract Operation

With argument direction.

The abstract operation CharacterSetMatcher takes twothree arguments, a CharSet A, a Boolean flag invert and an integer direction, and performs the following steps:

  1. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps when evaluated:
    1. Let e be x's endIndex.
    2. Let f be e + direction.
    3. If e is InputLength, return failure.
    4. If f < 0 or f > InputLength, return failure.
    5. Let ch be the character Input[e].
    6. Let index be min(e, f).
    7. Let ch be the character Input[index]..
    8. Let cc be Canonicalize(ch).
    9. If invert is false, then
      1. If there does not exist a member a of set A such that Canonicalize(a) is cc, return failure.
    10. Else invert is true,
      1. If there exists a member a of set A such that Canonicalize(a) is cc, return failure.
    11. Let cap be x's captures List.
    12. Let y be the State (e+1, cap).
    13. Let y be the State (f, cap).
    14. Call c(y) and return its result.

3.6.2Runtime Semantics: Canonicalize ( ch )

The abstract operation Canonicalize takes a character parameter ch and performs the following steps:

  1. If IgnoreCase is false, return ch.
  2. If Unicode is true, then
    1. If the file CaseFolding.txt of the Unicode Character Database provides a simple or common case folding mapping for ch, return the result of applying that mapping to ch.
    2. Else, return ch.
  3. Else,
    1. Assert: ch is a UTF-16 code unit.
    2. Let s be the ECMAScript String value consisting of the single code unit ch.
    3. Let u be the same result produced as if by performing the algorithm for String.prototype.toUpperCase using s as the this value.
    4. Assert: u is a String value.
    5. If u does not consist of a single code unit, return ch.
    6. Let cu be u's single code unit element.
    7. If ch's code unit value ≥ 128 and cu's code unit value < 128, return ch.
    8. Return cu.
Note 1

Parentheses of the form ( Disjunction ) serve both to group the components of the Disjunction pattern together and to save the result of the match. The result can be used either in a backreference (\ followed by a nonzero decimal number), referenced in a replace String, or returned as part of an array from the regular expression matching internal procedure. To inhibit the capturing behaviour of parentheses, use the form (?: Disjunction ) instead.

Note 2

The form (?= Disjunction ) specifies a zero-width positive lookahead. In order for it to succeed, the pattern inside Disjunction must match at the current position, but the current position is not advanced before matching the sequel. If Disjunction can match at the current position in several ways, only the first one is tried. Unlike other regular expression operators, there is no backtracking into a (?= form (this unusual behaviour is inherited from Perl). This only matters when the Disjunction contains capturing parentheses and the sequel of the pattern contains backreferences to those captures.

For example,

/(?=(a+))/.exec("baaabac")

matches the empty String immediately after the first b and therefore returns the array:

["", "aaa"]

To illustrate the lack of backtracking into the lookahead, consider:

/(?=(a+))a*b\1/.exec("baaabac")

This expression returns

["aba", "a"]

and not:

["aaaba", "a"]
Note 3

The form (?! Disjunction ) specifies a zero-width negative lookahead. In order for it to succeed, the pattern inside Disjunction must fail to match at the current position. The current position is not advanced before matching the sequel. Disjunction can contain capturing parentheses, but backreferences to them only make sense from within Disjunction itself. Backreferences to these capturing parentheses from elsewhere in the pattern always return undefined because the negative lookahead must fail for the pattern to succeed. For example,

/(.*?)a(?!(a+)b\2c)\2(.*)/.exec("baaabaac")

looks for an a not immediately followed by some positive number n of a's, a b, another n a's (specified by the first \2) and a c. The second \2 is outside the negative lookahead, so it matches against undefined and therefore always succeeds. The whole expression returns the array:

["baaabaac", "ba", undefined, "abaac"]
Note 4

In case-insignificant matches when Unicode is true, all characters are implicitly case-folded using the simple mapping provided by the Unicode standard immediately before they are compared. The simple mapping always maps to a single code point, so it does not map, for example, "ß" (U+00DF) to "SS". It may however map a code point outside the Basic Latin range to a character within, for example, "ſ" (U+017F) to "s". Such characters are not mapped if Unicode is false. This prevents Unicode code points such as U+017F and U+212A from matching regular expressions such as /[a-z]/i, but they will match /[a-z]/ui.

3.7AtomEscape

With argument direction.

The production AtomEscape::DecimalEscape evaluates as follows:

  1. Evaluate DecimalEscape to obtain an integer n.
  2. If n>NcapturingParens, throw a SyntaxError exception.
  3. Return an internal Matcher closure that takes two arguments, a State x and a Continuation c, and performs the following steps:
    1. Let cap be x's captures List.
    2. Let s be cap[n].
    3. If s is undefined, return c(x).
    4. Let e be x's endIndex.
    5. Let len be s's length.
    6. Let f be e+len.
    7. Let f be e + direction×len.
    8. If f < 0 or f>InputLength, return failure.
    9. Let g be min(e, f).
    10. If there exists an integer i between 0 (inclusive) and len (exclusive) such that Canonicalize(s[i]) is not the same character value as Canonicalize(Input[e+i g+i]), return failure.
    11. Let y be the State (f, cap).
    12. Call c(y) and return its result.

The production AtomEscape::CharacterEscape evaluates as follows:

  1. Evaluate CharacterEscape to obtain a character ch.
  2. Let A be a one-element CharSet containing the character ch.
  3. Call CharacterSetMatcher(A, false, direction) and return its Matcher result.

The production AtomEscape::CharacterClassEscape evaluates as follows:

  1. Evaluate CharacterClassEscape to obtain a CharSet A.
  2. Call CharacterSetMatcher(A, false, direction) and return its Matcher result.
Note

An escape sequence of the form \ followed by a nonzero decimal number n matches the result of the nth set of capturing parentheses (21.2.2.1). It is an error if the regular expression has fewer than n capturing parentheses. If the regular expression has n or more capturing parentheses but the nth one is undefined because it has not captured anything, then the backreference always succeeds.