Skip to content

regexMisleadingUnicodeCharacters

Reports characters in regex character classes that appear as single visual characters but are made of multiple code points.

βœ… This rule is included in the ts logical presets.

Some characters that appear as a single visual unit are actually composed of multiple Unicode code points. When these appear in regex character classes, each code point is matched separately, which is typically not the intended behavior.

This rule detects several types of multi-code-point characters in character classes:

  • Surrogate pairs: Characters like πŸ‘ that require two UTF-16 code units
  • Combined characters: Base characters with combining marks like Á (A + combining accent)
  • Emoji with modifiers: Emoji with skin tone modifiers like πŸ‘ΆπŸ»
  • Regional indicator symbols: Flag emoji like πŸ‡―πŸ‡΅ (two regional indicators)
  • ZWJ sequences: Characters joined with a zero-width joiner like πŸ‘¨β€πŸ‘©β€πŸ‘¦
// Surrogate pair without unicode flag
const pattern = /[πŸ‘]/;
// Combined character (A + combining accent)
const pattern = /[Á]/;
// Emoji with skin tone modifier
const pattern = /[πŸ‘ΆπŸ»]/u;
// Regional indicator symbols (flag)
const pattern = /[πŸ‡―πŸ‡΅]/u;
// ZWJ sequence (family emoji)
const pattern = /[πŸ‘¨β€πŸ‘©β€πŸ‘¦]/u;

This rule is not configurable.

If you intentionally want to match individual code points rather than visual characters, or if your regex pattern specifically needs to match partial Unicode sequences, you might prefer to disable this rule. Some specialized text processing may require matching individual surrogate halves or combining marks.

Made with ❀️‍πŸ”₯ in Boston by Josh Goldberg and contributors.