This module provides regular expression matching operations similar to those found in Emacs.
Obsolescence note:
This module is obsolete as of Python version 1.5; it is still being
maintained because much existing code still uses it. All new code in
need of regular expressions should use the new
re
module, which supports the more powerful
and regular Perl-style regular expressions. Existing code should be
converted. The standard library module
reconvert
helps in converting
regex
style regular expressions to re
style regular expressions. (For more conversion help, see Andrew
Kuchling's ``regex-to-re HOWTO'' at
http://www.python.org/doc/howto/regex-to-re/.)
By default the patterns are Emacs-style regular expressions (with one exception). There is a way to change the syntax to match that of several well-known Unix utilities. The exception is that Emacs' "\s"pattern is not supported, since the original implementation references the Emacs syntax tables.
This module is 8-bit clean: both patterns and strings may contain null bytes and characters whose high bit is set.
Please note: There is a little-known fact about Python string
literals which means that you don't usually have to worry about
doubling backslashes, even though they are used to escape special
characters in string literals as well as in regular expressions. This
is because Python doesn't remove backslashes from string literals if
they are followed by an unrecognized escape character.
However, if you want to include a literal backslash in a
regular expression represented as a string literal, you have to
quadruple it or enclose it in a singleton character class.
E.g. to extract LATEX "\section{ ...}" headers
from a document, you can use this pattern:
'[\]section{\(.*\)}'
. Another exception:
the escape sequece "\b" is significant in string literals
(where it means the ASCII bell character) as well as in Emacs regular
expressions (where it stands for a word boundary), so in order to
search for a word boundary, you should use the pattern '\\b'
.
Similarly, a backslash followed by a digit 0-7 should be doubled to
avoid interpretation as an octal escape.