Quantifiers +, *, ? and {n}

Let’s say we have a string like +7(903)-123-45-67 and want to find all numbers in it. But unlike before, we are interested in not digits, but full numbers: 7, 903, 123, 45, 67.

A number is a sequence of 1 or more digits \d. The instrument to say how many we need is called quantifiers.

Quantity {n}

The most obvious quantifier is a number in figure quotes: {n}. A quantifier is put after a character (or a character class and so on) and specifies exactly how many we need.

It also has advanced forms, here we go with examples:

Exact count: {5}

\d{5} denotes exactly 5 digits, the same as \d\d\d\d\d.

The example below looks for a 5-digit number:

alert( "I'm 12345 years old".match(/\d{5}/) ); //  "12345"

We can add \b to exclude longer numbers: \b\d{5}\b.

The count from-to: {3,5}

To find numbers from 3 to 5 digits we can put the limits into figure brackets: \d{3,5}

alert( "I'm not 12, but 1234 years old".match(/\d{3,5}/) ); // "1234"

We can omit the upper limit. Then a regexp \d{3,} looks for numbers of 3 and more digits:

alert( "I'm not 12, but 345678 years old".match(/\d{3,}/) ); // "345678"

In case with the string +7(903)-123-45-67 we need numbers: one or more digits in a row. That is \d{1,}:

let str = "+7(903)-123-45-67";

let numbers = str.match(/\d{1,}/g);

alert(numbers); // 7,903,123,45,67

Shorthands

Most often needed quantifiers have shorthands:

+

Means “one or more”, the same as {1,}.

For instance, \d+ looks for numbers:

let str = "+7(903)-123-45-67";

alert( str.match(/\d+/g) ); // 7,903,123,45,67
?

Means “zero or one”, the same as {0,1}. In other words, it makes the symbol optional.

For instance, the pattern ou?r looks for o followed by zero or one u, and then r.

So it can find or in the word color and our in colour:

let str = "Should I write color or colour?";

alert( str.match(/colou?r/g) ); // color, colour
*

Means “zero or more”, the same as {0,}. That is, the character may repeat any times or be absent.

The example below looks for a digit followed by any number of zeroes:

alert( "100 10 1".match(/\d0*/g) ); // 100, 10, 1

Compare it with '+' (one or more):

alert( "100 10 1".match(/\d0+/g) ); // 100, 10

More examples

Quantifiers are used very often. They are one of the main “building blocks” for complex regular expressions, so let’s see more examples.

Regexp “decimal fraction” (a number with a floating point): \d+\.\d+

In action:

alert( "0 1 12.345 7890".match(/\d+\.\d+/g) ); // 12.345
Regexp “open HTML-tag without attributes”, like <span> or <p>: /<[a-z]+>/i

In action:

alert( "<body> ... </body>".match(/<[a-z]+>/gi) ); // <body>

We look for character '<' followed by one or more English letters, and then '>'.

Regexp “open HTML-tag without attributes” (improved): /<[a-z][a-z0-9]*>/i

Better regexp: according to the standard, HTML tag name may have a digit at any position except the first one, like <h1>.

alert( "<h1>Hi!</h1>".match(/<[a-z][a-z0-9]*>/gi) ); // <h1>
Regexp “opening or closing HTML-tag without attributes”: /<\/?[a-z][a-z0-9]*>/i

We added an optional slash /? before the tag. Had to escape it with a backslash, otherwise JavaScript would think it is the pattern end.

alert( "<h1>Hi!</h1>".match(/<\/?[a-z][a-z0-9]*>/gi) ); // <h1>, </h1>
More precise means more complex

We can see one common rule in these examples: the more precise is the regular expression – the longer and more complex it is.

For instance, HTML tags could use a simpler regexp: <\w+>.

Because \w means any English letter or a digit or '_', the regexp also matches non-tags, for instance <_>. But it’s much simpler than <[a-z][a-z0-9]*>.

Are we ok with <\w+> or we need <[a-z][a-z0-9]*>?

In real life both variants are acceptable. Depends on how tolerant we can be to “extra” matches and whether it’s difficult or not to filter them out by other means.

Tasks

importance: 5

Create a regexp to find ellipsis: 3 (or more?) dots in a row.

Check it:

let reg = /your regexp/g;
alert( "Hello!... How goes?.....".match(reg) ); // ..., .....

Solution:

let reg = /\.{3,}/g;
alert( "Hello!... How goes?.....".match(reg) ); // ..., .....

Please note that the dot is a special character, so we have to escape it and insert as \..

Create a regexp to search HTML-colors written as #ABCDEF: first # and then 6 hexadimal characters.

An example of use:

let reg = /...your regexp.../

let str = "color:#121212; background-color:#AA00ef bad-colors:f#fddee #fd2 #12345678";

alert( str.match(reg) )  // #121212,#AA00ef

P.S. In this task we do not need other color formats like #123 or rgb(1,2,3) etc.

We need to look for # followed by 6 hexadimal characters.

A hexadimal character can be described as [0-9a-fA-F]. Or if we use the i flag, then just [0-9a-f].

Then we can look for 6 of them using the quantifier {6}.

As a result, we have the regexp: /#[a-f0-9]{6}/gi.

let reg = /#[a-f0-9]{6}/gi;

let str = "color:#121212; background-color:#AA00ef bad-colors:f#fddee #fd2"

alert( str.match(reg) );  // #121212,#AA00ef

The problem is that it finds the color in longer sequences:

alert( "#12345678".match( /#[a-f0-9]{6}/gi ) ) // #12345678

To fix that, we can add \b to the end:

// color
alert( "#123456".match( /#[a-f0-9]{6}\b/gi ) ); // #123456

// not a color
alert( "#12345678".match( /#[a-f0-9]{6}\b/gi ) ); // null
Tutorial map

Comments

read this before commenting…
  • You're welcome to post additions, questions to the articles and answers to them.
  • To insert a few words of code, use the <code> tag, for several lines – use <pre>, for more than 10 lines – use a sandbox (plnkr, JSBin, codepen…)
  • If you can't understand something in the article – please elaborate.