Regular Expressions Tutorial

Regular Expressions (or regex) is a way to specify pattern matching in strings. So let’s jump in. Here’s a sentence:

The quick brown fox jumps over the lazy dog

And here’s a regular expression:


So that expression will search that string for the string “own” anywhere in that sentence, whether it’s at the beginning of a word or part of a word or whatever. In this case the word “brown” contains “own”, so the search proved successful.What if we wanted to match a word that starts with “umps” then we could just use this:


a dot can represent any single character except a newline. Also notice the the space before the dot, that means that this will have to be matched as a separate word within the sentence (as long as it’s not the first word). In our sentence, we have 1 match for ” .umps”, which is ” jumps”.
Check this one out:


The pipe symbol seperates multiple expressions to match against, so the above regex would match for the string “quick” or “lazy”
This one will match against “lazy”, “lazer”, or “lazarus”:


Using the grouping parenthesis, I created a subexpression. It’s usage is kind of obvious but less obvious to try to explain so I’m trusting that you understand the usage of parenthesis.
Now let’s take a look at these:

  • ? = Matches the preceding element zero or one time.
  • * = Matches the preceding element zero or more times.
  • + = Matches the preceding element one or more times.

This one will match “fx”, “fix”, “fox”, “fax”, “fux”, etc:


The dot means any character (besides newline), but the question mark means that the previous character (in this case: dot) may or may not be present.
This will match “ac”, “abc”, “abbc”, “abbbc”, etc:


The asterisk matches the previous character (b) zero or more times.
This will match “Welcome” enclosed by at least one pair of equal signs:


So the above will match something like: “=========================Welcome==========================”
Use {m,n} which matches the preceding element [i]m[/i] and not more than [i]n[/i] times. If n is not specified, then it will match the preceding character exactly m times.


The above will match “aaa”, “bbb”, or “ccc”.
Another operator you could use is the dash, it will match a range of characters (as denoted by ascii) where the beginning character is specified preceding the dash and the ending character specified proceeding the dash.


Will match any lower-case alphabetical character followed by a numbeer from 0 through 9.
Now the special brackets can be used, which will match any 1 character out of all the character contained in the group. This is also used to escape the other special pattern matching characters in regex except for ^. When using ^ at the beginning of the character sequence within the brackets, it will match all characters that are not contained within the brackets. The following will match any filename ending with a .exe or .com:


This will match any word not starting with ‘.’, ‘?’, or ‘$’:


Thank you for reading my tutorial, now have a look at this reference for escape sequences:

^    =   Beginning of string (or line in multiline mode)
$    =   End of string (or line in multiline mode)
\w   =   [A-Za-z0-9]
\W   =   [^a-Za-z0-9]
\d   =   [0-9]
\D   =   [^0-9]
\s   =   [ \t\r\n\v\f]
\S   =   [^ \t\r\n\v\f]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: