Character classes

Let’s say we’ve got to find a digit in a string. Not the given digit, but any digit, so it should find 1 in Only 1 and 5 in Give me a 5.

Substring matching can be used to search for all digits from 0 to 9 in a loop. But the regexp matching handles the case gracefully.

A regular expression may have a character class instead of an exact character.

For example, an arbitrary digit is denoted as \d in the regular expression. The example below matches a digit:

showMatch( "I'm 5 years old", /\d/ )   // 5

Most useful classes are:

A digit, any character from 0 to 9
A whitespace character, like tab, newline etc.
A symbol of Latin alphabet or a digit or an underscore '_'

A regexp may contain many together regular symbols and character classes:

showMatch( "I'm the 1st one", /\dst/ )   // matches '1st'

Below several classes are in one regexp:

showMatch( "I'm 1 year old", /\d\s\w\w\w\w/ )   // 1 year

There are also inverted character classes:

A non-digit, the inversion of \d
A non-whitespace, the inversion of \s.
A symbol which is neither from Latin alphabet, nor a digit, nor an underscore, the inversion of \w

In the example below, we seek a first non-wordly character:

showMatch( "I'm 1 year old", /\W/ )   // matches apostrophe '

A regexp may also contain non-printable string characters: \n, \t and others. Theese are of course just characters, not classes.

Spaces are important

Usually, we don’t pay enough attention to spaces. A 1-5 or 1 - 5, no much visual difference.

But in regular expressions, a space is just like any other symbol.

The regexp below doesn’t work, because it doesn’t include space symbols:

showMatch( "1 - 5", /\d-\d/ )  // no matches!

Let’s fix it. We could put space symbols in regexp or, better, include a generic space symbol:

showMatch( "1 - 5", /\d - \d/ )   // works
showMatch( "1 - 5", /\d\s-\s\d/ ) // also works
showMatch( "1-5", /\d - \d/ ) // *!*fails!*/!* (no spaces in string)

The last match fails, because the subject has no spaces. So don’t put extra spaces in regular expressions, they are all meaningful.

In regular expression, the dot '.' denotes any character except a newline:

showMatch( "A char", /ch.r/ ) // "char"
showMatch( "A ch-r", /ch.r/ ) // "ch-r"
showMatch( "A ch r", /ch.r/ ) // "ch r", the space is also a char

Although the dot stands for any char, but there must be a char:

showMatch( "A chr", /ch.r/ ) // not found