Proposal proposal-intl-segmenter

Stage 4 Draft / January 21, 2022

Intl.Segmenter Proposal

1 Segmenter Objects

1.1 The Intl.Segmenter Constructor

The Segmenter constructor is the %Segmenter% intrinsic object and a standard built-in property of the Intl object. Behaviour common to all service constructor properties of the Intl object is specified in 9.1.

1.1.1 Intl.Segmenter ( [ locales [ , options ] ] )

When the Intl.Segmenter function is called with optional arguments locales and options, the following steps are taken:

  1. If NewTarget is undefined, throw a TypeError exception.
  2. Let internalSlotsList be « [[InitializedSegmenter]], [[Locale]], [[SegmenterGranularity]] ».
  3. Let segmenter be ? OrdinaryCreateFromConstructor(NewTarget, "%Segmenter.prototype%", internalSlotsList).
  4. Let requestedLocales be ? CanonicalizeLocaleList(locales).
  5. Set options to ? GetOptionsObject(options).
  6. Let opt be a new Record.
  7. Let matcher be ? GetOption(options, "localeMatcher", "string", « "lookup", "best fit" », "best fit").
  8. Set opt.[[localeMatcher]] to matcher.
  9. Let localeData be %Segmenter%.[[LocaleData]].
  10. Let r be ResolveLocale(%Segmenter%.[[AvailableLocales]], requestedLocales, opt, %Segmenter%.[[RelevantExtensionKeys]], localeData).
  11. Set segmenter.[[Locale]] to r.[[locale]].
  12. Let granularity be ? GetOption(options, "granularity", "string", « "grapheme", "word", "sentence" », "grapheme").
  13. Set segmenter.[[SegmenterGranularity]] to granularity.
  14. Return segmenter.

1.2 Properties of the Intl.Segmenter Constructor

The Intl.Segmenter constructor has the following properties:

1.2.1 Intl.Segmenter.prototype

The value of Intl.Segmenter.prototype is %Segmenter.prototype%.

This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: false }.

1.2.2 Intl.Segmenter.supportedLocalesOf ( locales [ , options ] )

When the supportedLocalesOf method is called with arguments locales and options, the following steps are taken:

  1. Let availableLocales be %Segmenter%.[[AvailableLocales]].
  2. Let requestedLocales be ? CanonicalizeLocaleList(locales).
  3. Return ? SupportedLocales(availableLocales, requestedLocales, options).

1.2.3 Internal slots

The value of the [[AvailableLocales]] internal slot is implementation-defined within the constraints described in 9.1.

The value of the [[LocaleData]] internal slot is implementation-defined within the constraints described in 9.1.

The value of the [[RelevantExtensionKeys]] internal slot is « ».

Note
CLDR defines several extension keys, but this specification does not expose them.

1.3 Properties of the Intl.Segmenter Prototype Object

The Intl.Segmenter prototype object is itself an ordinary object. %Segmenter.prototype% is not an Intl.Segmenter instance and does not have an [[InitializedSegmenter]] internal slot or any of the other internal slots of Intl.Segmenter instance objects.

1.3.1 Intl.Segmenter.prototype.constructor

The initial value of Intl.Segmenter.prototype.constructor is %Segmenter%.

1.3.2 Intl.Segmenter.prototype [ @@toStringTag ]

The initial value of the @@toStringTag property is the String value "Intl.Segmenter".

This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: true }.

1.3.3 Intl.Segmenter.prototype.segment ( string )

The Intl.Segmenter.prototype.segment method is called on an Intl.Segmenter instance with argument string to create a Segments instance for the string using the locale and options of the Intl.Segmenter instance. The following steps are taken:

  1. Let segmenter be the this value.
  2. Perform ? RequireInternalSlot(segmenter, [[InitializedSegmenter]]).
  3. Let string be ? ToString(string).
  4. Return ! CreateSegmentsObject(segmenter, string).

1.3.4 Intl.Segmenter.prototype.resolvedOptions ( )

This function provides access to the locale and options computed during initialization of the object.

  1. Let segmenter be the this value.
  2. Perform ? RequireInternalSlot(segmenter, [[InitializedSegmenter]]).
  3. Let options be ! OrdinaryObjectCreate(%Object.prototype%).
  4. For each row of Table 1, except the header row, in table order, do
    1. Let p be the Property value of the current row.
    2. Let v be the value of segmenter's internal slot whose name is the Internal Slot value of the current row.
    3. Assert: v is not undefined.
    4. Perform ! CreateDataPropertyOrThrow(options, p, v).
  5. Return options.
Table 1: Resolved Options of Segmenter Instances
Internal Slot Property
[[Locale]] "locale"
[[SegmenterGranularity]] "granularity"

1.4 Properties of Intl.Segmenter Instances

Intl.Segmenter instances are ordinary objects that inherit properties from %Segmenter.prototype%.

Intl.Segmenter instances have an [[InitializedSegmenter]] internal slot.

Intl.Segmenter instances also have internal slots that are computed by the constructor:

  • [[Locale]] is a String value with the language tag of the locale whose localization is used for segmentation.
  • [[SegmenterGranularity]] is one of the String values "grapheme", "word", or "sentence", identifying the kind of text element to segment.

1.5 Segments Objects

A Segments instance is an object that represents the segments of a specific string, subject to the locale and options of its constructing Intl.Segmenter instance.

1.5.1 CreateSegmentsObject ( segmenter, string )

The CreateSegmentsObject abstract operation is called with arguments Intl.Segmenter instance segmenter and String value string to create a Segments instance referencing both. The following steps are taken:

  1. Let internalSlotsList be « [[SegmentsSegmenter]], [[SegmentsString]] ».
  2. Let segments be ! OrdinaryObjectCreate(%SegmentsPrototype%, internalSlotsList).
  3. Set segments.[[SegmentsSegmenter]] to segmenter.
  4. Set segments.[[SegmentsString]] to string.
  5. Return segments.

1.5.2 The %SegmentsPrototype% Object

The %SegmentsPrototype% object:

  • is the prototype of all Segments objects.
  • is an ordinary object.
  • has the following properties:

1.5.2.1 %SegmentsPrototype%.containing ( index )

The containing method is called on a Segments instance with argument index to return a Segment Data object describing the segment in the string including the code unit at the specified index according to the locale and options of the Segments intance's constructing Intl.Segmenter instance. The following steps are taken:

  1. Let segments be the this value.
  2. Perform ? RequireInternalSlot(segments, [[SegmentsSegmenter]]).
  3. Let segmenter be segments.[[SegmentsSegmenter]].
  4. Let string be segments.[[SegmentsString]].
  5. Let len be the length of string.
  6. Let n be ? ToIntegerOrInfinity(index).
  7. If n < 0 or nlen, return undefined.
  8. Let startIndex be ! FindBoundary(segmenter, string, n, before).
  9. Let endIndex be ! FindBoundary(segmenter, string, n, after).
  10. Return ! CreateSegmentDataObject(segmenter, string, startIndex, endIndex).

1.5.2.2 %SegmentsPrototype% [ @@iterator ] ( )

The @@iterator method is called on a Segments instance to create a Segment Iterator over its string using the locale and options of its constructing Intl.Segmenter instance. The following steps are taken:

  1. Let segments be the this value.
  2. Perform ? RequireInternalSlot(segments, [[SegmentsSegmenter]]).
  3. Let segmenter be segments.[[SegmentsSegmenter]].
  4. Let string be segments.[[SegmentsString]].
  5. Return ! CreateSegmentIterator(segmenter, string).

1.5.3 Properties of Segments Instances

Segments instances are ordinary objects that inherit properties from %SegmentsPrototype%.

Segments instances have a [[SegmentsSegmenter]] internal slot that references the constructing Intl.Segmenter instance.

Segments instances have a [[SegmentsString]] internal slot that references the String value whose segments they expose.

1.6 Segment Iterator Objects

A Segment Iterator is an object that represents a particular iteration over the segments of a specific string.

1.6.1 CreateSegmentIterator ( segmenter, string )

The CreateSegmentIterator abstract operation is called with arguments Intl.Segmenter instance segmenter and String value string to create a Segment Iterator over string using the locale and options of segmenter. The following steps are taken:

  1. Let internalSlotsList be « [[IteratingSegmenter]], [[IteratedString]], [[IteratedStringNextSegmentCodeUnitIndex]] ».
  2. Let iterator be ! OrdinaryObjectCreate(%SegmentIteratorPrototype%, internalSlotsList).
  3. Set iterator.[[IteratingSegmenter]] to segmenter.
  4. Set iterator.[[IteratedString]] to string.
  5. Set iterator.[[IteratedStringNextSegmentCodeUnitIndex]] to 0.
  6. Return iterator.

1.6.2 The %SegmentIteratorPrototype% Object

The %SegmentIteratorPrototype% object:

  • is the prototype of all Segment Iterator objects.
  • is an ordinary object.
  • has a [[Prototype]] internal slot whose value is the intrinsic object %Iterator.prototype%.
  • has the following properties:

1.6.2.1 %SegmentIteratorPrototype%.next ( )

The next method is called on a Segment Iterator instance to advance it forward one segment and return an IteratorResult object either describing the new segment or declaring iteration done. The following steps are taken:

  1. Let iterator be the this value.
  2. Perform ? RequireInternalSlot(iterator, [[IteratingSegmenter]]).
  3. Let segmenter be iterator.[[IteratingSegmenter]].
  4. Let string be iterator.[[IteratedString]].
  5. Let startIndex be iterator.[[IteratedStringNextSegmentCodeUnitIndex]].
  6. Let endIndex be ! FindBoundary(segmenter, string, startIndex, after).
  7. If endIndex is not finite, then
    1. Return ! CreateIterResultObject(undefined, true).
  8. Set iterator.[[IteratedStringNextSegmentCodeUnitIndex]] to endIndex.
  9. Let segmentData be ! CreateSegmentDataObject(segmenter, string, startIndex, endIndex).
  10. Return ! CreateIterResultObject(segmentData, false).

1.6.2.2 %SegmentIteratorPrototype% [ @@toStringTag ]

The initial value of the @@toStringTag property is the String value "Segmenter String Iterator".

This property has the attributes { [[Writable]]: false, [[Enumerable]]: false, [[Configurable]]: true }.

1.6.3 Properties of Segment Iterator Instances

Segment Iterator instances are ordinary objects that inherit properties from %SegmentIteratorPrototype%. Segment Iterator instances are initially created with the internal slots described in Table 2.

Table 2: Internal Slots of Segment Iterator Instances
Internal Slot Description
[[IteratingSegmenter]] The Intl.Segmenter instance used for iteration.
[[IteratedString]] The String value being iterated upon.
[[IteratedStringNextSegmentCodeUnitIndex]] The code unit index in the String value being iterated upon at the start of the next segment.

1.7 Segment Data Objects

A Segment Data object is an object that represents a particular segment from a string.

1.7.1 CreateSegmentDataObject ( segmenter, string, startIndex, endIndex )

The CreateSegmentDataObject abstract operation is called with arguments Intl.Segmenter instance segmenter, String value string, and indices startIndex and endIndex within string to create a Segment Data object describing the segment within string from segmenter that is bounded by the indices. The following steps are taken:

  1. Let len be the length of string.
  2. Assert: startIndex ≥ 0.
  3. Assert: endIndexlen.
  4. Assert: startIndex < endIndex.
  5. Let result be ! OrdinaryObjectCreate(%Object.prototype%).
  6. Let segment be the substring of string from startIndex to endIndex.
  7. Perform ! CreateDataPropertyOrThrow(result, "segment", segment).
  8. Perform ! CreateDataPropertyOrThrow(result, "index", 𝔽(startIndex)).
  9. Perform ! CreateDataPropertyOrThrow(result, "input", string).
  10. Let granularity be segmenter.[[SegmenterGranularity]].
  11. If granularity is "word", then
    1. Let isWordLike be a Boolean value indicating whether the segment in string is "word-like" according to locale segmenter.[[Locale]].
    2. Perform ! CreateDataPropertyOrThrow(result, "isWordLike", isWordLike).
  12. Return result.
Note
Whether a segment is "word-like" is implementation-dependent, and implementations are recommended to use locale-sensitive tailorings. In general, segments consisting solely of spaces and/or punctuation (such as those terminated with "WORD_NONE" boundaries by ICU [International Components for Unicode, documented at https://unicode-org.github.io/icu-docs/]) are not considered to be "word-like".

1.8 Abstract Operations for Segmenter Objects

1.8.1 FindBoundary ( segmenter, string, startIndex, direction )

The FindBoundary abstract operation is called with arguments Intl.Segmenter instance segmenter, String string, integer startIndex, and direction (which must be before or after) to find a segmentation boundary between two code units in string in the specified direction from the code unit at index startIndex according to the locale and options of segmenter and return the immediately following code unit index (which will be infinite if no such boundary exists). The following steps are taken:

Note
Boundary determination is implementation-dependent, but general default algorithms are specified in Unicode Standard Annex 29 (available at https://www.unicode.org/reports/tr29/). It is recommended that implementations use locale-sensitive tailorings such as those provided by the Common Locale Data Repository (available at http://cldr.unicode.org).
  1. Let locale be segmenter.[[Locale]].
  2. Let granularity be segmenter.[[SegmenterGranularity]].
  3. Let len be the length of string.
  4. If direction is before, then
    1. Assert: startIndex ≥ 0.
    2. Assert: startIndex < len.
    3. Search string for the last segmentation boundary that is preceded by at most startIndex code units from the beginning, using locale locale and text element granularity granularity.
    4. If a boundary is found, return the count of code units in string preceding it.
    5. Return 0.
  5. Assert: direction is after.
  6. If len is 0 or startIndexlen, return +∞.
  7. Search string for the first segmentation boundary that follows the code unit at index startIndex, using locale locale and text element granularity granularity.
  8. If a boundary is found, return the count of code units in string preceding it.
  9. Return len.

A Implementation Dependent Behaviour

The following aspects of the ECMAScript 2021 Internationalization API Specification are implementation dependent: