Implement the v flag for RegExp in SpiderMonkey

The problem

The RegExp object in SpiderMonkey does not support the v flag. The v flag is a new flag introduced in ECMAScript 2022. The v flag is used to match vertical whitespace characters. The vertical whitespace characters are the characters that are used to separate lines of text. The vertical whitespace characters include the following characters:

The tc39 proposal for the v flag can be found here

The proposal is made by Mathias Bynens who was a v8 engineer.

See their blog post here

Summary of the v8 post

The v flag is typically used to deal with unicode characters appearing in the RegExp.

Matching Emoji characters is a good example of where the v flag is useful.

// Unicode defines a character property named “Emoji”.
const re = /^\p{Emoji}$/u;

// Match an emoji that consists of just 1 code point:
re.test('⚽'); // '\u26BD'
// → true ✅

// Match an emoji that consists of multiple code points:
re.test('👨🏾‍⚕️'); // '\u{1F468}\u{1F3FE}\u200D\u2695\uFE0F'
// → false ❌
const re = /^\p{RGI_Emoji}$/v;

// Match an emoji that consists of just 1 code point:
re.test('⚽'); // '\u26BD'
// → true ✅

// Match an emoji that consists of multiple code points:
re.test('👨🏾‍⚕️'); // '\u{1F468}\u{1F3FE}\u200D\u2695\uFE0F'
// → true ✅

The test

I didn’t do this at first, but I should have written a test for this.

The v flag is not tested in the test262 testsuite.

So I wrote a test for this and contributed back to test262.

The Regex v flag

The implementation in Spidermonkey

The regex engine in spidermonkey is implemented using v8’s irregex engine.

Reference

More on regex engine in spidermonkey.

https://hacks.mozilla.org/2020/06/a-new-regexp-engine-in-spidermonkey/

Licensed under CC BY-NC-SA 4.0
Last updated on Sep 21, 2024 19:23 UTC