Index: Doc/howto/regex.rst =================================================================== --- Doc/howto/regex.rst (revision 87896) +++ Doc/howto/regex.rst (working copy) @@ -5,7 +5,6 @@ **************************** :Author: A.M. Kuchling -:Release: 0.05 .. TODO: Document lookbehind assertions @@ -24,11 +23,6 @@ Introduction ============ -The :mod:`re` module was added in Python 1.5, and provides Perl-style regular -expression patterns. Earlier versions of Python came with the :mod:`regex` -module, which provided Emacs-style patterns. The :mod:`regex` module was -removed completely in Python 2.5. - Regular expressions (called REs, or regexes, or regex patterns) are essentially a tiny, highly specialized programming language embedded inside Python and made available through the :mod:`re` module. Using this little language, you specify @@ -264,7 +258,7 @@ >>> import re >>> p = re.compile('ab*') >>> p - <_sre.SRE_Pattern object at 80b4150> + <_sre.SRE_Pattern object at 0x80b4150> :func:`re.compile` also accepts an optional *flags* argument, used to enable various special features and syntax variations. We'll go over the available @@ -362,22 +356,21 @@ and more. You can learn about this by interactively experimenting with the :mod:`re` -module. If you have Tkinter available, you may also want to look at -:file:`Tools/scripts/redemo.py`, a demonstration program included with the +module. If you have :mod:`tkinter` available, you may also want to look at +:file:`Tools/scripts/demo.py`, a demonstration program included with the Python distribution. It allows you to enter REs and strings, and displays whether the RE matches or fails. :file:`redemo.py` can be quite useful when trying to debug a complicated RE. Phil Schwartz's `Kodos -`_ is also an interactive tool for developing and -testing RE patterns. +`_ is also an +interactive tool for developing and testing RE patterns. This HOWTO uses the standard Python interpreter for its examples. First, run the Python interpreter, import the :mod:`re` module, and compile a RE:: - Python 2.2.2 (#1, Feb 10 2003, 12:57:01) >>> import re >>> p = re.compile('[a-z]+') >>> p - <_sre.SRE_Pattern object at 80c3c28> + <_sre.SRE_Pattern object at 0x80c3c28> Now, you can try matching various strings against the RE ``[a-z]+``. An empty string shouldn't match at all, since ``+`` means 'one or more repetitions'. @@ -395,7 +388,7 @@ >>> m = p.match('tempo') >>> m - <_sre.SRE_Match object at 80c4f68> + <_sre.SRE_Match object at 0x80c4f68> Now you can query the :class:`MatchObject` for information about the matching string. :class:`MatchObject` instances also have several methods and @@ -434,7 +427,7 @@ >>> print(p.match('::: message')) None >>> m = p.search('::: message') ; print(m) - + <_sre.SRE_Match object at 0x80c9650> >>> m.group() 'message' >>> m.span() @@ -459,7 +452,7 @@ :meth:`findall` has to create the entire list before it can be returned as the result. The :meth:`finditer` method returns a sequence of :class:`MatchObject` -instances as an :term:`iterator`. [#]_ :: +instances as an :term:`iterator`:: >>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...') >>> iterator @@ -485,7 +478,7 @@ >>> print(re.match(r'From\s+', 'Fromage amk')) None >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998') - + <_sre.SRE_Match object at 0x80c5978> Under the hood, these functions simply create a pattern object for you and call the appropriate method on it. They also store the compiled object in a @@ -687,7 +680,7 @@ line, the RE to use is ``^From``. :: >>> print(re.search('^From', 'From Here to Eternity')) - + <_sre.SRE_Match object at 0x80c1520> >>> print(re.search('^From', 'Reciting From Memory')) None @@ -699,11 +692,11 @@ or any location followed by a newline character. :: >>> print(re.search('}$', '{block}')) - + <_sre.SRE_Match object at 0x80adfa8> >>> print(re.search('}$', '{block} ')) None >>> print(re.search('}$', '{block}\n')) - + <_sre.SRE_Match object at 0x80adfa8> To match a literal ``'$'``, use ``\$`` or enclose it inside a character class, as in ``[$]``. @@ -728,7 +721,7 @@ >>> p = re.compile(r'\bclass\b') >>> print(p.search('no class at all')) - + <_sre.SRE_Match object at 0x80c8f28> >>> print(p.search('the declassified algorithm')) None >>> print(p.search('one subclass is')) @@ -746,7 +739,7 @@ >>> print(p.search('no class at all')) None >>> print(p.search('\b' + 'class' + '\b') ) - + <_sre.SRE_Match object at 0x80c3ee0> Second, inside a character class, where there's no use for this assertion, ``\b`` represents the backspace character, for compatibility with Python's @@ -1316,7 +1309,7 @@ be *very* complicated. Use an HTML or XML parser module for such tasks.) -Not Using re.VERBOSE +Using re.VERBOSE -------------------- By now you've probably noticed that regular expressions are a very compact @@ -1366,8 +1359,3 @@ now-removed :mod:`regex` module, which won't help you much.) Consider checking it out from your library. - -.. rubric:: Footnotes - -.. [#] Introduced in Python 2.2.2. -