### Tip: Regular Expression Precedence Made Simple (as Arithmetic)

The basic operations in Perl regular expressions are repetition, sequence, and alternation. That is also - from highest to lowest (tightest-binding to loosest-binding) - their precedence. A super-quick review first:

a*# repetition - the character a repeated zero or more times

b+# repetition - the character b repeated one or more times

x{1,3}# repetition - the character x repeated one to three times

abc# sequence - the character a, then the character b, then the character c

a|b|c# alternation - the character a,orthe character b,orthe character c

It's important to understand precedence in regular expressions. For example:

abc{3}

means the characters `'ab'`

followed by *three* instances of the character `'c'`

. When I see something like `abc{3}`

I usually think that the author really meant "three instances of the characters `'abc'`

" - which is written differently:

(abc){3}

As you can see, you can use parentheses to control the order in which the bits of a regular expression are interpreted. I like to make an analogy to mathematical (algebraic) expressions. Even though a regular expression isn't a mathematical expression, the syntax is at least somewhat similar, especially where precedence is concerned. From the standpoint of precedence, you can think of `a{3}`

as being something like *x10* - exponentation, the highest-precedence operation in algebraic notation. `abc`

is like *xyz* (the variables *x*, *y*, and *z* multiplied together) - multiplication having intermediate precedence - and `a|b|c`

is like *x + y + z* - addition having low precedence. This becomes useful when you try to figure out things like:

a|b|c# the character a, or the character b, or the character c

a|b|c{2}# the character a, the character b, or two c's in a row

# like a + b + c2

(a|b|c){2}# one of a or b or c followed by one of a or b or c

# like (a + b + c)2

(a|b|c)+# one or more a or b or c

(abc)+# abc one or more times in a row (abc, abcabc, abcabcabc, etc.)

So, think:

- Repetition: exponentiation (highest)
- Sequence: multiplication (middle)
- Alternation: addition (lowest)

Now, the usefulness of all this depends on arithmetic (or algebra) being easy, which may be something else altogether.

## 0 Comments:

Post a Comment

## Links to this post:

Create a Link

<< Home