.NET Regex Reference

About Regular Expressions

Regular expressions (often abbreviated "regex") are written in a formal language and provide a powerful and concise way to find complex patterns inside text. Although regular expressions can seem cryptic and confusing at first, they can also save you hours of writing procedural code to perform the same task.

Regular Expressions in the Microsoft .NET Framework

.NET uses a very powerful set of regular expression functionality based on the often imitated Perl 5 implementation. Therefore, Perl regular expressions often work with the .NET regular expression engine. However, for all practical purposes the .NET regular expression engine is a unique implementation since it has some unique features of its own. For example, you can enable RegexOptions.RightToLeft which can be a hugely beneficial feature under certain situations.

There are a lot of other things .NET has that Perl can't do. Here are just a few:

Variable-width lookbehinds
(?<=abc.+)

Character class subtraction
[a-z-[e]]

Balancing group definitions
(?<close-open>")

And then there are also numerous syntactic differences. You may find Perl regular expressions that simply don't behave the same way in .NET. All that said, it's best to use a tool that runs off of the .NET Regex engine such as Regex Hero when you must write .NET compatible regular expressions.

Syntax Reference for .NET Regular Expressions

The reference below is based on material provided by MSDN. You may click some of the items below to see corresponding examples. For more help see Microsoft's Developer Guide for Regular Expressions or fire up Regex Hero which contains a reference as well as regex code completion.



Characters

The following expressions will match single characters. For more information see Microsoft's article on Character Classes.

Ordinary characters
Characters other than . $ ^ { [ ( | ) * + ? \ match themselves.
a matches a and b matches b
.
Matches any character excluding the line feed. Includes the line feed in single-line mode.
. matches a or 1 or almost anything else
[abc]
A character class (may contain more than one character). Matches any character that is contained within the brackets, in no particular order.
[abc] matches a, b, or c
[^abc]
The opposite of [ ]. Matches all characters not contained within the brackets.
[^abc] matches anything except a, b, or c
[a-z]
Character range: Matches any single character in the range from first (a) to last (z).
[a-z] matches a, m, or z
\w
Matches an alpha-numeric character (a-z, A-Z, 0-9, and underscore).
\w matches a or b
\W
The opposite of \w. Matches any non-alphanumeric character.
\W matches - but does not match a
\d
Matches a decimal character (0-9).
\d matches 1 or 2
\D
The opposite of \d. Matches any non-decimal character.
\D matches a or b
\s
Matches a character of whitespace (space, tab, carriage return, line feed).
a\sb matches a b
\S
The opposite of \s. Matches any non-whitespace character.
a\Sb matches a-b
\r
Matches a carriage return.
a\rb matches a
b
\n
Matches a new line (line feed).
a\nb matches a
b
\f
Matches a form feed.
\t
Matches a tab.
a\tb matches a    b
\v
Matches a vertical tab.
\a
Matches a bell character.
\b
In a character class, matches a backspace.
\e
Matches an escape.
\040
Uses octal representation to specify a character (octal consists of up to three digits).
\x20
Uses hexadecimal representation to specify a character (hex consists of exactly two digits).
\c0003
Matches the specified 4-digit ASCII control character.
\u0020
Matches a Unicode character by using hexadecimal representation (exactly four digits).
\p{name}
Matches any single character in the Unicode general category or named block specified by name.
\P{name}
Matches any single character that is not in the Unicode general category or named block specified by name.
\
In front of any of the special characters (. $ ^ { [ ( | ) * + ? \), this will match the character itself.
\$5 matches $5 and \\ matches \

Assertions

The following expressions specify the location to search for a match, but do not match anything themselves.

^
The match must start at the beginning of the string (or beginning of the line in multiline mode).
^cat matches cat but does not match bobcat
$
The match must occur at the end of the string or before \n at the end of the string (or end of the line in multiline mode).
dog$ matches dog but does not match dogfight
\A
The match must occur at the start of the string.
\Z
The match must occur at the end of the string or before \n at the end of the string.
\z
The match must occur at the end of the string.
\G
The match must occur at the point where the previous match ended.
\b
Asserts a boundary between word and non-word characters.
grape\b matches grape, cherry but does not match grapefruit
\B
The opposite of \b. Asserts a location that is not a boundary between word and non-word characters.
grape\B matches grapefruit but does not match grape, cherry
(?=pattern)
Asserts that the specified pattern exists immediately after this location. Known as a positive lookahead.
too many(?= secrets) matches too many secrets but does not match too many
(?!pattern)
Asserts that the specified pattern does not exist immediately after this location. Known as a negative lookahead.
too many(?! secrets) matches too many but does not match too many secrets
(?<=pattern)
Asserts that the specified pattern exists immediately before this location. Known as a positive lookbehind.
(?<=too )many secrets matches too many secrets but does not match many secrets
(?<!pattern)
Asserts that the specified pattern does not exist immediately before this location. Known as a negative lookbehind.
(?<!too )many secrets matches many secrets but does not match too many secrets

Quantifiers

The following expressions will indicate a repetition of the previous character or group.

?
Repeat 0 or 1 time matching as many times as possible.
abc.? matches abc or abcd
*
Repeat 0 or more times matching as many times as possible.
abc.* matches abc or abcd or abcde
+
Repeat 1 or more times matching as many times as possible.
abc.+ matches abcd or abcde
??
Repeat 0 times or 1 time matching 0 times if possible.
abc?? matches abc
*?
Repeat 0 or more times matching as few times as possible.
ab.*?c matches abc or ab c
+?
Repeat 1 or more times matching as few times as possible.
ab.+?c matches abc or abbc
{n}
Repeat exactly n times.
\d{1} matches 5
{n,}
Repeat at least n times, matching as many times as possible.
\d{1,} matches 5 or 555
{n,}?
Repeat at least n times, matching as few times as possible.
\d{1,}? matches 555
{n,m}
Repeat at least n times, but no more than m times.
\n{1,2} matches 5 or 55
{n,m}?
Repeat at least n times, but no more than m times while matching as few as possible.
\n{1,2}? matches 55

Grouping

The following expressions allow grouped matching.

(pattern)
Captures the specified pattern as a group. Each group is numbered automatically starting from 1. Group 0 is actually not a group at all but refers to the text matched by the entire regular expression.
(?<name>pattern)
Captures the specified pattern into the specified group name. The string used for the name must not contain any punctuation and cannot begin with a number.
(?<name1-name2>pattern)
Defines a balancing group definition.
(?:pattern)
Does not capture the substring matched by this pattern. Known as a noncapturing group.
(?imnsx-imnsx:pattern)
Applies or disables the specified options within subexpression.
For more information see .NET's Regular Expression Options.
(?>pattern)
Nonbacktracking (or "greedy") subexpression.

Backreferences

A backreference allows a previously matched subexpression to be identified subsequently in the same regular expression.

\number
Backreference. Matches the value of a numbered subexpression.
\k<name>
Named backreference. Matches the value of a named expression.

Substitutions

Substititions are allowed only within replacement patterns.

$number
Substitutes the last substring matched by the specified group number. The numbering scheme for groups starts at 1 (0 represents the entire match).
${name}
Substitutes the last substring matched by a named group.
$&
Substitutes a copy of the entire match itself.
$`
Substitutes all the text of the input string before the match.
$'
Substitutes all the text of the input string after the match.
$+
Substitutes the last group captured.
$_
Substitutes the entire input string.

Alternation

The following expressions allow either/or matching.

|
Acts as a logical OR. When between two characters or groups, matches one or the other.
(?(pattern)yes|no)
Matches the first pattern in the OR statement (yes) if the specified pattern is found at this point. Otherwise, matches the second pattern in the OR statement (no).
(?(<name>)yes|no)
Matches the first pattern in the OR statement (yes) if the specified named group is found at this point. Otherwise, matches the second pattern in the OR statement (no).

Comments

The following expression allows comments to be inserted in your regular expression.

(?#comment)
Everything from the pound sign (#) to the end parenthesis is a comment and will be ignored.
#comment
X-mode comment. The comment starts at an unescaped # and continues to the end of the line.
No matches found.

Example

To put some of this to use let's take a classic example:
The quick brown fox jumps over the lazy dog

Let's say we didn't care that the fox is brown, or that the dog is lazy but we wanted to match the whole string regardless of these details.

So the regular expression would be something like this:
^The quick\s?\w* fox jumps over the\s?\w* dog$

This would match the string even if the fox was red and the dog was hungry.

see this example in action

Legend

regular expression
matched text
unmatched text


Try it in the regex tester to see for yourself.


Also check out the public library to see practical examples.