December 10, 2020

Sticky flag "y", searching at position

The flag y allows to perform the search at the given position in the source string.

To grasp the use case of y flag, and better understand the ways of regexps, let’s explore a practical example.

One of common tasks for regexps is “lexical analysis”: we get a text, e.g. in a programming language, and need to find its structural elements. For instance, HTML has tags and attributes, JavaScript code has functions, variables, and so on.

Writing lexical analyzers is a special area, with its own tools and algorithms, so we don’t go deep in there, but there’s a common task: to read something at the given position.

E.g. we have a code string let varName = "value", and we need to read the variable name from it, that starts at position 4.

We’ll look for variable name using regexp \w+. Actually, JavaScript variable names need a bit more complex regexp for accurate matching, but here it doesn’t matter.

  • A call to str.match(/\w+/) will find only the first word in the line (let). That’s not it.
  • We can add the flag g. But then the call str.match(/\w+/g) will look for all words in the text, while we need one word at position 4. Again, not what we need.

So, how to search for a regexp exactly at the given position?

Let’s try using method regexp.exec(str).

For a regexp without flags g and y, this method looks only for the first match, it works exactly like str.match(regexp).

…But if there’s flag g, then it performs the search in str, starting from position stored in the regexp.lastIndex property. And, if it finds a match, then sets regexp.lastIndex to the index immediately after the match.

In other words, regexp.lastIndex serves as a starting point for the search, that each regexp.exec(str) call resets to the new value (“after the last match”). That’s only if there’s g flag, of course.

So, successive calls to regexp.exec(str) return matches one after another.

Here’s an example of such calls:

let str = 'let varName'; // Let's find all words in this string
let regexp = /\w+/g;

alert(regexp.lastIndex); // 0 (initially lastIndex=0)

let word1 = regexp.exec(str);
alert(word1[0]); // let (1st word)
alert(regexp.lastIndex); // 3 (position after the match)

let word2 = regexp.exec(str);
alert(word2[0]); // varName (2nd word)
alert(regexp.lastIndex); // 11 (position after the match)

let word3 = regexp.exec(str);
alert(word3); // null (no more matches)
alert(regexp.lastIndex); // 0 (resets at search end)

We can get all matches in the loop:

let str = 'let varName';
let regexp = /\w+/g;

let result;

while (result = regexp.exec(str)) {
  alert( `Found ${result[0]} at position ${result.index}` );
  // Found let at position 0, then
  // Found varName at position 4
}

Such use of regexp.exec is an alternative to method str.matchAll, with a bit more control over the process.

Let’s go back to our task.

We can manually set lastIndex to 4, to start the search from the given position!

Like this:

let str = 'let varName = "value"';

let regexp = /\w+/g; // without flag "g", property lastIndex is ignored

regexp.lastIndex = 4;

let word = regexp.exec(str);
alert(word); // varName

Hooray! Problem solved!

We performed a search of \w+, starting from position regexp.lastIndex = 4.

The result is correct.

…But wait, not so fast.

Please note: the regexp.exec call starts searching at position lastIndex and then goes further. If there’s no word at position lastIndex, but it’s somewhere after it, then it will be found:

let str = 'let varName = "value"';

let regexp = /\w+/g;

// start the search from position 3
regexp.lastIndex = 3;

let word = regexp.exec(str);
// found the match at position 4
alert(word[0]); // varName
alert(word.index); // 4

For some tasks, including the lexical analysis, that’s just wrong. We need to find a match exactly at the given position at the text, not somewhere after it. And that’s what the flag y is for.

The flag y makes regexp.exec to search exactly at position lastIndex, not “starting from” it.

Here’s the same search with flag y:

let str = 'let varName = "value"';

let regexp = /\w+/y;

regexp.lastIndex = 3;
alert( regexp.exec(str) ); // null (there's a space at position 3, not a word)

regexp.lastIndex = 4;
alert( regexp.exec(str) ); // varName (word at position 4)

As we can see, regexp /\w+/y doesn’t match at position 3 (unlike the flag g), but matches at position 4.

Not only that’s what we need, there’s an important performance gain when using flag y.

Imagine, we have a long text, and there are no matches in it, at all. Then a search with flag g will go till the end of the text and find nothing, and this will take significantly more time than the search with flag y, that checks only the exact position.

In tasks like lexical analysis, there are usually many searches at an exact position, to check what we have there. Using flag y is the key for correct implementations and a good performance.

Tutorial map

Comments

read this before commenting…
  • If you have suggestions what to improve - please submit a GitHub issue or a pull request instead of commenting.
  • If you can't understand something in the article – please elaborate.
  • To insert few words of code, use the <code> tag, for several lines – wrap them in <pre> tag, for more than 10 lines – use a sandbox (plnkr, jsbin, codepen…)