Tip: Regular Expression Precedence Made Simple (as Arithmetic)
The basic operations in Perl regular expressions are repetition, sequence, and alternation. That is also - from highest to lowest (tightest-binding to loosest-binding) - their precedence. A super-quick review first:
a* # repetition - the character a repeated zero or more times
b+ # repetition - the character b repeated one or more times
x{1,3} # repetition - the character x repeated one to three times
abc # sequence - the character a, then the character b, then the character c
a|b|c # alternation - the character a, or the character b, or the character c
It's important to understand precedence in regular expressions. For example:
abc{3}
means the characters 'ab'
followed by three instances of the character 'c'
. When I see something like abc{3}
I usually think that the author really meant "three instances of the characters 'abc'
" - which is written differently:
(abc){3}
As you can see, you can use parentheses to control the order in which the bits of a regular expression are interpreted. I like to make an analogy to mathematical (algebraic) expressions. Even though a regular expression isn't a mathematical expression, the syntax is at least somewhat similar, especially where precedence is concerned. From the standpoint of precedence, you can think of a{3}
as being something like x10 - exponentation, the highest-precedence operation in algebraic notation. abc
is like xyz (the variables x, y, and z multiplied together) - multiplication having intermediate precedence - and a|b|c
is like x + y + z - addition having low precedence. This becomes useful when you try to figure out things like:
a|b|c # the character a, or the character b, or the character c
a|b|c{2} # the character a, the character b, or two c's in a row
# like a + b + c2
(a|b|c){2} # one of a or b or c followed by one of a or b or c
# like (a + b + c)2
(a|b|c)+ # one or more a or b or c
(abc)+ # abc one or more times in a row (abc, abcabc, abcabcabc, etc.)
So, think:
- Repetition: exponentiation (highest)
- Sequence: multiplication (middle)
- Alternation: addition (lowest)
Now, the usefulness of all this depends on arithmetic (or algebra) being easy, which may be something else altogether.