§20.9. Summary of regular expression notation
MATCHING
Positional restrictions
^ | Matches (accepting no text) only at the start of the text |
$ | Matches (accepting no text) only at the end of the text |
\b | Word boundary: matches at either end of text or between a \w and a \W |
\B | Matches anywhere where \b does not match |
Backslashed character classes
\char | If char is other than a-z, A-Z, 0-9 or space, matches that literal char |
\\ | For example, this matches literal backslash "\" |
\n | Matches literal line break character |
\t | Matches literal tab character (but use this only with external files) |
\d | Matches any single digit |
\l | Matches any lower case letter (by Unicode 4.0.0 definition) |
\p | Matches any single punctuation mark: . , ! ? - / " : ; ( ) [ ] { } |
\s | Matches any single spacing character (space, line break, tab) |
\u | Matches any upper case letter (by Unicode 4.0.0 definition) |
\w | Matches any single word character (neither \p nor \s) |
\D | Matches any single non-digit |
\L | Matches any non-lower-case-letter |
\P | Matches any single non-punctuation-mark |
\S | Matches any single non-spacing-character |
\U | Matches any non-upper-case-letter |
\W | Matches any single non-word-character (i.e., matches either \p or \s) |
Other character classes
. | Matches any single character |
<...> | Character range: matches any single character inside |
<^...> | Negated character range: matches any single character not inside |
Inside a character range
e-h | Any character in the run "e" to "h" inclusive (and so on for other runs) |
>... | Starting with ">" means that a literal close angle bracket is included |
\ | Backslash has the same meaning as for backslashed character classes: see above |
Structural
| | Divides alternatives: "fish|fowl" matches either |
(?i) | Always matches: switches to case-insensitive matching from here on |
(?-i) | Always matches: switches to case-sensitive matching from here on |
Repetitions
...? | Matches "..." either 0 or 1 times, i.e., makes "..." optional |
...* | Matches "..." 0 or more times: e.g. "\s*" matches an optional run of space |
...+ | Matches "..." 1 or more times: e.g. "x+" matches any run of "x"s |
...{6} | Matches "..." exactly 6 times (similarly for other numbers, of course) |
...{2,5} | Matches "..." between 2 and 5 times |
...{3,} | Matches "..." 3 or more times |
....? | "?" after any repetition makes it "lazy", matching as few repeats as it can |
Numbered subexpressions
(...) | Groups part of the expression together: matches if the interior matches |
\1 | Matches the contents of the 1st subexpression reading left to right |
\2 | Matches the contents of the 2nd, and so on up to "\9" (but no further) |
Unnumbered subexpressions
(# ...) | Comment: always matches, and the contents are ignored |
(?= ...) | Lookahead: matches if the text ahead matches "...", but doesn't consume it |
(?! ...) | Negated lookahead: matches if lookahead fails |
(?<= ...) | Lookbehind: matches if the text behind matches "...", but doesn't consume it |
(?<! ...) | Negated lookbehind: matches if lookbehind fails |
(> ...) | Possessive: tries to match "..." and if it succeeds, never backtracks on this |
(?(1)...) | Conditional: if \1 has matched by now, require that "..." be matched |
(?(1)...|...) | Conditional: ditto, but if \1 has not matched, require the second part |
(?(?=...)...|...) | Conditional with lookahead as its condition for which to match |
(?(?<=...)...|...) | Conditional with lookbehind as its condition for which to match |
IN REPLACEMENT TEXT
\char | If char is other than a-z, A-Z, 0-9 or space, expands to that literal char |
\\ | In particular, "\\" expands to a literal backslash "\" |
\n | Expands to a line break character |
\t | Expands to a tab character (but use this only with external files) |
\0 | Expands to the full text matched |
\1 | Expands to whatever the 1st bracketed subexpression matched |
\2 | Expands to whatever the 2nd matched, and so on up to "\9" (but no further) |
\l0 | Expands to \0 converted to lower case (and so on for "\l1" to "\l9") |
\u0 | Expands to \0 converted to upper case (and so on for "\u1" to "\u9") |