Regex Cheat Sheet

From KdjWiki

Jump to: navigation, search

Basic Expressions

'^' matches start:

 "^The"		matches any string that starts with "The"

'$' matches end:

 "of despair$"	matches a string that ends in the substring "of despair"

other examples:

 "^abc$"	matches a string that starts and ends with "abc" -- that could only be "abc" itself
 "notice"	matches a string that has the text "notice" in it

'*' means "zero or more" (of the previous character)

 "ab*"		matches a string that has an a followed by zero or more b's ("a", "ab", "abbb", etc.)

'+' means "one or more" (of the previous character)

 "ab+"		same, but there's at least one b ("ab", "abbb", etc.)

'?' means "zero or one" (of the previous character)

 "ab?"		there might be a b or not

combinations:

 "a?b+$"	a possible a followed by one or more b's ending a string

Bounds (which indicate ranges) are denoted by braces:

 "ab{2}"	matches a string that has an a followed by exactly two b's ("abb")
 "ab{2,}"	matches a string where there are at least two b's ("abb", "abbbb", etc.)
 "ab{3,5}"	matches a string which has from three to five b's ("abbb", "abbbb", or "abbbbb")

Note: you must always specify the first number of a range (i.e, "{0,2}", not "{,2}"). Also, as you might have noticed, the symbols '*', '+', and '?' have the same effect as using the bounds "{0,}", "{1,}", and "{0,1}", respectively.

To quantify a sequence of characters, put them inside parentheses:

 "a(bc)*"	matches a string that has an a followed by zero or more copies of the sequence "bc"
 "a(bc){1,5}"	one through five copies of "bc."

'|' acts as an OR operator:

 "hi|hello"	matches a string that has either "hi" or "hello" in it
 "(b|cd)ef"	a string that has either "bef" or "cdef"
 "(a|b)*c"	a string that has a sequence of alternating a's and b's ending in a c

Bracket expressions specify which characters are allowed in a single position of a string:

 "[ab]"	matches a string that has either an a or a b (that's the same as "a|b")
 "[a-d]"	a string that has lowercase letters 'a' through 'd' (that's equal to "a|b|c|d" and even "[abcd]")
 "^[a-zA-Z]"	a string that starts with a letter
 "[0-9]%"	a string that has a single digit before a percent sign
 ",[a-zA-Z0-9]$"	a string that ends in a comma followed by an alphanumeric character

You can also list which characters you DON'T want -- just use a '^' as the first symbol in a bracket expression:

 "%[^a-zA-Z]%"	matches a string with a character that is not a letter between two percent signs

'.' stands for any single character:

 "a.[0-9]"	matches a string that has an a followed by one character and a digit
 "^.{3}$"	a string with exactly 3 characters

In order to be taken literally, you must escape the characters "^.[$()|*+?{\" with a backslash ('\'), as they have special meaning.