Character sets and ranges

Several characters or character classes may be grouped in square brackets [...] to search for any of them.

For instance, [eao] means any of characters ‘a’, ‘e’, or ‘o’. That’s a single char from the list.

showMatch( "The OGRE on green grass!", /gr[eao]/gi ) // "GRE", "gre", "gra"

Here, gr[eao] matches gre, not gree, because [eao] stands for only one char.

The time can be represented as hour:minute or hour-minute, both hour and minute are 2 digits:

09:00
21-30

Create a regular expression to find all times in: Breakfast at 09:00. Dinner at 21-30.

Open solution
Solution

The regular expression: \d\d[-:]\d\d.

showMatch( "Breakfast at 09:00. Dinner at 21-30.", /\d\d[-:]\d\d/g )

Note that we in the character set, hyphen '-' is not escaped, because it may not be special in this position.

Flag g means global search instead of only first match.

Square brackets can also contain character ranges. For example, [a-z] is a character from a to z, [0-5] matches a character from 0 to 5.

showMatch( "Exception 0xAF", /x[A-F]/g ) // matches "xA", not "xc"

The example above doesn’t find xc in Exception, because the range contains only uppercase characters and there is no i flag.

The regexp matches xA, and not xAF, because [A-F] is a single character from A to F.

Characters, classes and ranges can be put together.

The example finds any character from ranges a-f, A-F or x or a digit:

showMatch( "look -> 0xAF", /[\dA-Fa-fx]/g ) // "0", "x", "A", "F"

Most character classes are actually a short representation of ranges, for example:

  • \d is same as [0-9],
  • \w is same as [a-zA-Z0-9_],
  • \s is same as [\t\n\v\f\r ] plus several unicode space symbols.

There are also negated ranges: [^...].

Square brackets starting with a caret: [^...] find all characters except the given ones.

For example:

  • [^aeou] - any character except ‘a’,’e’,’o’,’u’
  • [^0-9] - any non-digit, same as \D
  • [^\s] - any not-a-space, same as \S

Just like the ordinary range, a negated range may contain multiple characters and ranges.
The example below looks for non-letters, non-digits, non-spaces:

showMatch( "alice15@gmail.com", /[^\d\sA-Z]/gi ) // "@", "."

Does the pattern k[^s] match the text sock ?

Open solution
Solution

The regexp looks for a character "k" followed by a character which can be anything except s.

But in sock, there is no character, hence no match.

A character set [...] must always match a character, no matter if it is inverted or not.

Most special characters can be used in square brackets without escaping.

In square brackets you only need to escape the closing square bracket ']', and the backslash '\'.

Other special characters are escaped only if they may have special meaning

  • The hyphen '-' must be escaped only if it’s in-between other symbols. If it first or last, then it may not denote a range, and hence can come unescaped: [-...].
  • The caret symbl '^' must be escaped only if it’s the first symbol [\^..].
  • All other characters, including dot '.', plus '+', brackets '( )', opening square bracket '[' etc can appear unescaped.

If you look at most regular expression in the code around you, special characters are usually escaped no matter where they are in the regexp.
But square brackets often allow to remove escaping. It makes the pattern more readable.

For example, the regexp [-().^] literally means any of characters from the list -().^. Regexp special symbols do not have any special meaning here.

var re = /[-().^]/g

showMatch( "f(g)-^1", re ) // matches (, ), -, ^

So, technically it is possible to save on extra slashes in square brackets. But if you forget it and put them in, nothing breaks.

var re = /[\-\(\)\.\^]/g 

showMatch( "f(g)-^1", re ) // matches same (, ), -, ^

Tutorial

Donate

Donate to this project