Alternation is the term in regular expression that is actually a simple “OR”.
In a regular expression it is denoted with a vertical line character |
.
For instance, we need to find programming languages: HTML, PHP, Java or JavaScript.
The corresponding regexp: html|php|java(script)?
.
A usage example:
let regexp = /html|php|css|java(script)?/gi;
let str = "First HTML appeared, then CSS, then JavaScript";
alert( str.match(regexp) ); // 'HTML', 'CSS', 'JavaScript'
We already saw a similar thing – square brackets. They allow to choose between multiple characters, for instance gr[ae]y
matches gray
or grey
.
Square brackets allow only characters or character classes. Alternation allows any expressions. A regexp A|B|C
means one of expressions A
, B
or C
.
For instance:
gr(a|e)y
means exactly the same asgr[ae]y
.gra|ey
meansgra
orey
.
To apply alternation to a chosen part of the pattern, we can enclose it in parentheses:
I love HTML|CSS
matchesI love HTML
orCSS
.I love (HTML|CSS)
matchesI love HTML
orI love CSS
.
Example: regexp for time
In previous articles there was a task to build a regexp for searching time in the form hh:mm
, for instance 12:00
. But a simple \d\d:\d\d
is too vague. It accepts 25:99
as the time (as 99 minutes match the pattern, but that time is invalid).
How can we make a better pattern?
We can use more careful matching. First, the hours:
- If the first digit is
0
or1
, then the next digit can be any:[01]\d
. - Otherwise, if the first digit is
2
, then the next must be[0-3]
. - (no other first digit is allowed)
We can write both variants in a regexp using alternation: [01]\d|2[0-3]
.
Next, minutes must be from 00
to 59
. In the regular expression language that can be written as [0-5]\d
: the first digit 0-5
, and then any digit.
If we glue hours and minutes together, we get the pattern: [01]\d|2[0-3]:[0-5]\d
.
We’re almost done, but there’s a problem. The alternation |
now happens to be between [01]\d
and 2[0-3]:[0-5]\d
.
That is: minutes are added to the second alternation variant, here’s a clear picture:
[01]\d | 2[0-3]:[0-5]\d
That pattern looks for [01]\d
or 2[0-3]:[0-5]\d
.
But that’s wrong, the alternation should only be used in the “hours” part of the regular expression, to allow [01]\d
OR 2[0-3]
. Let’s correct that by enclosing “hours” into parentheses: ([01]\d|2[0-3]):[0-5]\d
.
The final solution:
let regexp = /([01]\d|2[0-3]):[0-5]\d/g;
alert("00:00 10:10 23:59 25:99 1:2".match(regexp)); // 00:00,10:10,23:59
Comments are back :)
<code>
tag, for several lines – wrap them in<pre>
tag, for more than 10 lines – use a sandbox (plnkr, jsbin, codepen…)