Regex Cheat Sheet
From KdjWiki
Basic Expressions
'^' matches start:
"^The" matches any string that starts with "The"
'$' matches end:
"of despair$" matches a string that ends in the substring "of despair"
other examples:
"^abc$" matches a string that starts and ends with "abc" -- that could only be "abc" itself "notice" matches a string that has the text "notice" in it
'*' means "zero or more" (of the previous character)
"ab*" matches a string that has an a followed by zero or more b's ("a", "ab", "abbb", etc.)
'+' means "one or more" (of the previous character)
"ab+" same, but there's at least one b ("ab", "abbb", etc.)
'?' means "zero or one" (of the previous character)
"ab?" there might be a b or not
combinations:
"a?b+$" a possible a followed by one or more b's ending a string
Bounds (which indicate ranges) are denoted by braces:
"ab{2}" matches a string that has an a followed by exactly two b's ("abb")
"ab{2,}" matches a string where there are at least two b's ("abb", "abbbb", etc.)
"ab{3,5}" matches a string which has from three to five b's ("abbb", "abbbb", or "abbbbb")
Note: you must always specify the first number of a range (i.e, "{0,2}", not "{,2}"). Also, as you might have noticed, the symbols '*', '+', and '?' have the same effect as using the bounds "{0,}", "{1,}", and "{0,1}", respectively.
To quantify a sequence of characters, put them inside parentheses:
"a(bc)*" matches a string that has an a followed by zero or more copies of the sequence "bc"
"a(bc){1,5}" one through five copies of "bc."
'|' acts as an OR operator:
"hi|hello" matches a string that has either "hi" or "hello" in it "(b|cd)ef" a string that has either "bef" or "cdef" "(a|b)*c" a string that has a sequence of alternating a's and b's ending in a c
Bracket expressions specify which characters are allowed in a single position of a string:
"[ab]" matches a string that has either an a or a b (that's the same as "a|b") "[a-d]" a string that has lowercase letters 'a' through 'd' (that's equal to "a|b|c|d" and even "[abcd]") "^[a-zA-Z]" a string that starts with a letter "[0-9]%" a string that has a single digit before a percent sign ",[a-zA-Z0-9]$" a string that ends in a comma followed by an alphanumeric character
You can also list which characters you DON'T want -- just use a '^' as the first symbol in a bracket expression:
"%[^a-zA-Z]%" matches a string with a character that is not a letter between two percent signs
'.' stands for any single character:
"a.[0-9]" matches a string that has an a followed by one character and a digit
"^.{3}$" a string with exactly 3 characters
In order to be taken literally, you must escape the characters "^.[$()|*+?{\" with a backslash ('\'), as they have special meaning.