regexMisleadingUnicodeCharacters

Reports characters in regex character classes that appear as single visual characters but are made of multiple code points.

✅ This rule is included in the ts logical presets.

Some characters that appear as a single visual unit are actually composed of multiple Unicode code points. When these appear in regex character classes, each code point is matched separately, which is typically not the intended behavior.

This rule detects several types of multi-code-point characters in character classes:

Surrogate pairs: Characters like 👍 that require two UTF-16 code units
Combined characters: Base characters with combining marks like Á (A + combining accent)
Emoji with modifiers: Emoji with skin tone modifiers like 👶🏻
Regional indicator symbols: Flag emoji like 🇯🇵 (two regional indicators)
ZWJ sequences: Characters joined with a zero-width joiner like 👨‍👩‍👦

// Surrogate pair without unicode flag
const pattern = /[👍]/;

// Combined character (A + combining accent)
const pattern = /[Á]/;

// Emoji with skin tone modifier
const pattern = /[👶🏻]/u;

// Regional indicator symbols (flag)
const pattern = /[🇯🇵]/u;

// ZWJ sequence (family emoji)
const pattern = /[👨‍👩‍👦]/u;

// Unicode flag handles surrogate pairs correctly
const pattern = /[👍]/u;

// Match outside character class
const pattern = /👍/;

// Use precomposed character
const pattern = /[Á]/;

// Match emoji sequence outside character class
const pattern = /👶🏻/;

// Use \q{} syntax with v flag for grapheme clusters
const pattern = /[\q{👶🏻}]/v;

// Solo regional indicator is fine
const pattern = /[🇯]/u;

Options

This rule is not configurable.

When Not To Use It

If you intentionally want to match individual code points rather than visual characters, or if your regex pattern specifically needs to match partial Unicode sequences, you might prefer to disable this rule. Some specialized text processing may require matching individual surrogate halves or combining marks.

Equivalents in Other Linters

Made with ❤️‍🔥 in Boston by Josh Goldberg and contributors.

regexMisleadingUnicodeCharacters

Examples

Options

When Not To Use It

Further Reading

Equivalents in Other Linters