Regular expressions in JavaScript

Regular expressions provide a way for really powerful substring search and replace.

They exist in many languages. In JavaScript, the simplified perl-like syntax is used.

Character classes

Let’s say we’ve got to find a digit in a string. Not the given digit, but any digit, so it should find 1 in Only 1 and 5 in Give me a 5.

Substring matching can be used to search for all digits from 0 to 9 in a loop. But the regexp matching handles the case gracefully.

Special characters

There are characters which have special use in regexps: [ \ ^ $ . | ? * + ( ).

They are special, because they are used to enhance regexp searching abilities. Don’t try to remember the list. You will find them easy to remember after we cover them.

To use a special character as a regular symbol, it must be escaped. Or, in other words, prepended with a backslash.

For example, we need to find the dot '.'. In a regexp, it is a special symbol meaning any character excepts a newline.

Character sets and ranges

Several characters or character classes may be grouped in square brackets [...] to search for any of them.

For instance, [eao] means any of characters ‘a’, ‘e’, or ‘o’. That’s a single char from the list.

Introduction

Regular expressions is a powerful way of string search and replace. In JavaScript, it is integrated in String methods search, match and replace.

Flags

A regular expression may have optional flags, which affect the search.

Numeric Quantifiers

Say, we’ve got to find a 3-digit number. With \d that’s simple:

showMatch( "I'm 100 years old", /\d\d\d/ )  // 100

But let’s go a step further. What if we want to search for 5-digit numbers. Should we repeat \d 5 times: \d\d\d\d\d?

Luckily, there is a better way.

Greedy and Lazy

  1. The searching algorithm
  2. Lazy mode
  3. Alternative approach

Let’s get under the hood of regexp engine and see how the search is performed. The understanding is essential for writing anything more complex than /\d/.

Ahchors and multiline mode

  1. Multiline mode

The caret '^' and the dollar '$' symbols have special meaning in a regexp. They are called anchors.

The caret '^' matches at text start and the dollar '$' matches at text end.

Alternation

Alternation is denoted by the vertical line '|'. It allows to choose between multiple variants.

Groups

A part of the regular expression can be grouped together in brackets (...).

Quantifiers are applied to whole group instead of just one char.

Quantifiers +, * and ?

There are short quantifiers '+', '*' and '?', which are used very widely.

Basically, they are convenient shortcuts for numeric quantifiers:

Word boundary

Another position check is a word boundary \b. It doesn’t match a character, but matches in situations when a wordly character follows a non-wordly or vice versa. A “non-wordly” may also be text start or end.

Practice

Here you found tasks which help in understanding regexp construction principles.

Infinite backtracking problem

The backtracking nature of regular expressions in JavaScript (and most other languages which use same type of regexp processing) may lead extremely long or almost infinite searching time.

Tutorial

Donate

Donate to this project