Word boundary

Another position check is a word boundary \b. It doesn’t match a character, but matches in situations when a wordly character follows a non-wordly or vice versa. A “non-wordly” may also be text start or end.

For example, \bdog\b matches a standalone dog, not doggy or catdog:

showMatch( "doggy catdog dog", /\bdog\b/ ) // "dog"

Here, dog matches, because the previous char is a space (non-wordly), and the next position is text end.

Normally, \w{4} matches 4 consequent word characters.
If the word is long enough, it may match multiple times:

showMatch( "Boombaroom", /\w{4}/g) // 'Boom', 'baro'

Appending \b causes \w{4}\b to match only at word end:

showMatch( "Because life is awesome", /\w{4}\b/g) // 'ause', 'life', 'some'

The word boundary \b like ^ and $ doesn’t match a char. It only performs the check.

Let’s add the check from another side, \b\w{4}\b:

showMatch( "Because life is awesome", /\b\w{4}\b/g) //  'life'

Now there is only one result life.

  1. The regexp engine matches first word boundary \b at zero position:
  2. Then it successfully matches \w{4}, but fails to match finishing \b.

    So, the match at position zero fails.

  3. The search continues from position 1, and the closest \b is right after Because (position 9):

    Now \w{4} doesn’t match, because the next character is a space.

  4. The search continues, and the closest \b is right before life at position 11.

    Finally, \w{4} matches and the position check \b after it is positive. We’ve got the result.

  5. The search continues after the match, but doesn’t yield new results.

The word boundary check /\b/ works only for words in latin alphabet, because it is based on \w as “wordly” chars. Sometimes that’s acceptable, but limits the application range of the feature.

And, for completeness..
There is also an inverse check \B, meaning a position other than \b. It is extremely rarely used.



Donate to this project