Sets and ranges [...]

Several characters or character classes inside square brackets […] mean to “search for any character among given”.

Sets

For instance, [eao] means any of the 3 characters: 'a', 'e', or 'o'.

That’s called a set. Sets can be used in a regexp along with regular characters:

// find [t or m], and then "op"
alert( "Mop top".match(/[tm]op/gi) ); // "Mop", "top"

Please note that although there are multiple characters in the set, they correspond to exactly one character in the match.

So the example above gives no matches:

// find "V", then [o or i], then "la"
alert( "Voila".match(/V[oi]la/) ); // null, no matches

The pattern assumes:

  • V,
  • then one of the letters [oi],
  • then la.

So there would be a match for Vola or Vila.

Ranges

Square brackets may also contain character ranges.

For instance, [a-z] is a character in range from a to z, and [0-5] is a digit from 0 to 5.

In the example below we’re searching for "x" followed by two digits or letters from A to F:

alert( "Exception 0xAF".match(/x[0-9A-F][0-9A-F]/g) ); // xAF

Please note that in the word Exception there’s a substring xce. It didn’t match the pattern, because the letters are lowercase, while in the set [0-9A-F] they are uppercase.

If we want to find it too, then we can add a range a-f: [0-9A-Fa-f]. The i flag would allow lowercase too.

Character classes are shorthands for certain character sets.

For instance:

  • \d – is the same as [0-9],
  • \w – is the same as [a-zA-Z0-9_],
  • \s – is the same as [\t\n\v\f\r ] plus few other unicode space characters.

We can use character classes inside […] as well.

For instance, we want to match all wordly characters or a dash, for words like “twenty-third”. We can’t do it with \w+, because \w class does not include a dash. But we can use [\w-].

We also can use a combination of classes to cover every possible character, like [\s\S]. That matches spaces or non-spaces – any character. That’s wider than a dot ".", because the dot matches any character except a newline.

Excluding ranges

Besides normal ranges, there are “excluding” ranges that look like [^…].

They are denoted by a caret character ^ at the start and match any character except the given ones.

For instance:

  • [^aeyo] – any character except 'a', 'e', 'y' or 'o'.
  • [^0-9] – any character except a digit, the same as \D.
  • [^\s] – any non-space character, same as \S.

The example below looks for any characters except letters, digits and spaces:

alert( "[email protected]".match(/[^\d\sA-Z]/gi) ); // @ and .

No escaping in […]

Usually when we want to find exactly the dot character, we need to escape it like \.. And if we need a backslash, then we use \\.

In square brackets the vast majority of special characters can be used without escaping:

  • A dot '.'.
  • A plus '+'.
  • Parentheses '( )'.
  • Dash '-' in the beginning or the end (where it does not define a range).
  • A caret '^' if not in the beginning (where it means exclusion).
  • And the opening square bracket '['.

In other words, all special characters are allowed except where they mean something for square brackets.

A dot "." inside square brackets means just a dot. The pattern [.,] would look for one of characters: either a dot or a comma.

In the example below the regexp [-().^+] looks for one of the characters -().^+:

// No need to escape
let reg = /[-().^+]/g;

alert( "1 + 2 - 3".match(reg) ); // Matches +, -

…But if you decide to escape them “just in case”, then there would be no harm:

// Escaped everything
let reg = /[\-\(\)\.\^\+]/g;

alert( "1 + 2 - 3".match(reg) ); // also works: +, -

Tasks

We have a regexp /Java[^script]/.

Does it match anything in the string Java? In the string JavaScript?

Answers: no, yes.

  • In the script Java it doesn’t match anything, because [^script] means “any character except given ones”. So the regexp looks for "Java" followed by one such symbol, but there’s a string end, no symbols after it.

    alert( "Java".match(/Java[^script]/) ); // null
  • Yes, because the regexp is case-insensitive, the [^script] part matches the character "S".

    alert( "JavaScript".match(/Java[^script]/) ); // "JavaS"

The time can be in the format hours:minutes or hours-minutes. Both hours and minutes have 2 digits: 09:00 or 21-30.

Write a regexp to find time:

let reg = /your regexp/g;
alert( "Breakfast at 09:00. Dinner at 21-30".match(reg) ); // 09:00, 21-30

P.S. In this task we assume that the time is always correct, there’s no need to filter out bad strings like “45:67”. Later we’ll deal with that too.

Answer: \d\d[-:]\d\d.

let reg = /\d\d[-:]\d\d/g;
alert( "Breakfast at 09:00. Dinner at 21-30".match(reg) ); // 09:00, 21-30

Please note that the dash '-' has a special meaning in square brackets, but only between other characters, not when it’s in the beginning or at the end, so we don’t need to escape it.

Tutorial map

Comments

read this before commenting…
  • You're welcome to post additions, questions to the articles and answers to them.
  • To insert a few words of code, use the <code> tag, for several lines – use <pre>, for more than 10 lines – use a sandbox (plnkr, JSBin, codepen…)
  • If you can't understand something in the article – please elaborate.