archives

« Bugzilla Issues Index

#4159 — 21.2.5.* RegExp algorithm: in several places, in case of unicode-matching, index should be conditionally incremented by more than 1


Let's take for example 21.5.2.2 RegExpBuiltinExec, step 15.c.ii:

ii. Let lastIndex = lastIndex + 1.

If fullUnicode (step 13) is true, one should test whether the current code unit is a high surrogate and the following one a low surrogate, in which case one should advance by 2 instead of 1.

This is correctly implemented in 21.2.5.11 RegExp.prototype[@@split], steps 24.e.i-ii and 24.f.iii.1-2, but not in others algorithms of section 21.2.5.

Here is a proposed patch:



NextStringIndex(string, index, unicode)
------------------------------------
This abstract operation returns index + 1, or index + 2 if unicode is true and there is a matching pair of surrogates in `string` at position `index`.

1. Assert `string` is a String.
2. Assert `index` is an integer between 0 and 2^53-1.
3. Assert `unicode` is a Boolean.
4. Let `length` be the number of code units in `string`.
5. If `boolean` is false, return `index` + 1.
6. If `index` + 1 >= `length`, return `index` + 1.
7. Let `first` be the code unit value at index `index` in `string`.
8. If `first` < 0xD800 or `first` > 0xDBFF, return `index` + 1.
9. Let `second` be the code unit value at index `index` + 1 in `string`.
10. If `second` < 0xDC00 or `second` > 0xDFFF, return `index` + 1.
11. Return `index` + 2.




21.2.5.2.2 RegExpBuiltinExec
----------------------------
Replace step 15.c.ii with:

ii. Let lastIndex be NextStringIndex(S, lastIndex, fullUnicode).



21.2.5.6 RegExp.prototype[@@match]
----------------------------------
Current step 8 becomes:

8. Else, global is true,
a. Let unicodeMatching be ToBoolean(Get(rx, "unicode")).
b. ReturnIfAbrupt(unicodeMatching).
c. (proceed with current step a)

Current steps 8.e.iv.5.c-d become:

c. Let nextIndex be NextStringIndex(S, thisIndex, unicodeMatching).
d. Let setStatus be Set(rx, "lastIndex", nextIndex, true).
e. ReturnIfAbrupt(setStatus).


21.2.5.6 RegExp.prototype[@@replace]
----------------------------------
Current step 10 becomes:

10. If global is true,
a. Let unicodeMatching be ToBoolean(Get(rx, "unicode")).
b. ReturnIfAbrupt(unicodeMatching).
c. (proceed with current step a)

Current steps 13.d.iii.3.c-d become:

c. Let nextIndex be NextStringIndex(S, thisIndex, unicodeMatching).
d. Let setStatus be Set(rx, "lastIndex", nextIndex, true).
e. ReturnIfAbrupt(setStatus).



21.2.5.11 RegExp.prototype [@@split]
------------------------------------
Using the NextStringIndex abstract operations, step 24.e can be rewritten:

e. If z is null, let q be NextStringIndex(S, q, unicodeMatching).

and step 24.f.iii:

iii. If e = p, let q be NextStringIndex(S, q, unicodeMatching).


fixed in rev36 editor's draft


in rev36