Getting Started

Import the regular expressions module

import re

Control verb

- -
(*ACCEPT) Control verb
(*FAIL) Control verb
(*MARK:NAME) Control verb
(*COMMIT) Control verb
(*PRUNE) Control verb
(*SKIP) Control verb
(*THEN) Control verb
(*UTF) Pattern modifier
(*UTF8) Pattern modifier
(*UTF16) Pattern modifier
(*UTF32) Pattern modifier
(*UCP) Pattern modifier
(*CR) Line break modifier
(*LF) Line break modifier
(*CRLF) Line break modifier
(*ANYCRLF) Line break modifier
(*ANY) Line break modifier
\R Line break modifier
(*BSR_ANYCRLF) Line break modifier
(*BSR_UNICODE) Line break modifier
(*LIMIT_MATCH=x) Regex engine modifier
(*LIMIT_RECURSION=d) Regex engine modifier
(*NO_AUTO_POSSESS) Regex engine modifier
(*NO_START_OPT) Regex engine modifier

POSIX Character Classes

Character Class Same as Meaning
[[:alnum:]] [0-9A-Za-z] Letters and digits
[[:alpha:]] [A-Za-z] Letters
[[:ascii:]] [\x00-\x7F] ASCII codes 0-127
[[:blank:]] [\t ] Space or tab only
[[:cntrl:]] [\x00-\x1F\x7F] Control characters
[[:digit:]] [0-9] Decimal digits
[[:graph:]] [[:alnum:][:punct:]] Visible characters (not space)
[[:lower:]] [a-z] Lowercase letters
[[:print:]] [ -~] == [ [:graph:]] Visible characters
[[:punct:]] [!"#$%&’()*+,-./:;<=>?@[]^_{ }~]`
[[:space:]] [\t\n\v\f\r ] Whitespace
[[:upper:]] [A-Z] Uppercase letters
[[:word:]] [0-9A-Za-z_] Word characters
[[:xdigit:]] [0-9A-Fa-f] Hexadecimal digits
[[:<:]] [\b(?=\w)] Start of word
[[:>:]] [\b(?<=\w)] End of word

Recurse

- -
(?R) Recurse entire pattern
(?1) Recurse first subpattern
(?+1) Recurse first relative subpattern
(?&name) Recurse subpattern name
(?P=name) Match subpattern name
(?P>name) Recurse subpattern name

Flags/Modifiers

Pattern Description
g Global
m Multiline
i Case insensitive
x Ignore whitespace
s Single line
u Unicode
X eXtended
U Ungreedy
A Anchor
J Duplicate group names

Lookarounds

- -
(?=...) Positive Lookahead
(?!...) Negative Lookahead
(?<=...) Positive Lookbehind
(?<!...) Negative Lookbehind
Lookaround lets you match a group before (lookbehind) or after (lookahead) your main pattern without including it in the result.

Assertions

- -
(?(1)yes|no) Conditional statement
(?(R)yes|no) Conditional statement
(?(R#)yes|no) Recursive Conditional statement
(?(R&name)yes|no) Conditional statement
(?(?=...)yes|no) Lookahead conditional
(?(?<=...)yes|no) Lookbehind conditional

Group Constructs

Pattern Description
(...) Capture everything enclosed
(a|b) Match either a or b
(?:...) Match everything enclosed
(?>...) Atomic group (non-capturing)
(?|...) Duplicate subpattern group number
(?#...) Comment
(?'name'...) Named Capturing Group
(?<name>...) Named Capturing Group
(?P<name>...) Named Capturing Group
(?imsxXU) Inline modifiers
(?(DEFINE)...) Pre-define patterns before using them

Substitution

Pattern Description
\0 Complete match contents
\1 Contents in capture group 1
$1 Contents in capture group 1
${foo} Contents in capture group foo
\x20 Hexadecimal replacement values
\x{06fa} Hexadecimal replacement values
\t Tab
\r Carriage return
\n Newline
\f Form-feed
\U Uppercase Transformation
\L Lowercase Transformation
\E Terminate any Transformation

Anchors

Pattern Description
\G Start of match
^ Start of string
$ End of string
\A Start of string
\Z End of string
\z Absolute end of string
\b A word boundary
\B Non-word boundary

Meta Sequences

Pattern Description
. Any single character
\s Any whitespace character
\S Any non-whitespace character
\d Any digit, Same as [0-9]
\D Any non-digit, Same as [^0-9]
\w Any word character
\W Any non-word character
\X Any Unicode sequences, linebreaks included
\C Match one data unit
\R Unicode newlines
\v Vertical whitespace character
\V Negation of \v - anything except newlines and vertical tabs
\h Horizontal whitespace character
\H Negation of \h
\K Reset match
\n Match nth subpattern
\pX Unicode property X
\p{...} Unicode property or script category
\PX Negation of \pX
\P{...} Negation of \p
\Q...\E Quote; treat as literals
\k<name> Match subpattern name
\k'name' Match subpattern name
\k{name} Match subpattern name
\gn Match nth subpattern
\g{n} Match nth subpattern
\g<n> Recurse nth capture group
\g'n' Recurses nth capture group.
\g{-n} Match nth relative previous subpattern
\g<+n> Recurse nth relative upcoming subpattern
\g'+n' Match nth relative upcoming subpattern
\g'letter' Recurse named capture group letter
\g{letter} Match previously-named capture group letter
\g<letter> Recurses named capture group letter
\xYY Hex character YY
\x{YYYY} Hex character YYYY
\ddd Octal character ddd
\cY Control character Y
[\b] Backspace character
\ Makes any character literal

Common Metacharacters

  • ^
  • {
  • +
  • <
  • [
  • *
  • )
  • >
  • .
  • (
  • |
  • $
  • \
  • ? Escape these special characters with \

Quantifiers

Pattern Description
a? Zero or one of a
a* Zero or more of a
a+ One or more of a
[0-9]+ One or more of 0-9
a{3} Exactly 3 of a
a{3,} 3 or more of a
a{3,6} Between 3 and 6 of a
a* Greedy quantifier
a*? Lazy quantifier
a*+ Possessive quantifier

Character Classes

Pattern Description
[abc] A single character of: a, b or c
[^abc] A character except: a, b or c
[a-z] A character in the range: a-z
[^a-z] A character not in the range: a-z
[0-9] A digit in the range: 0-9
[a-zA-Z] A character in the range:a-z or A-Z
[a-zA-Z0-9] A character in the range: a-z, A-Z or 0-9

Introduction

This is a quick cheat sheet to getting started with regular expressions.

Comments