This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.sub backreferences to numbered groups produce garbage
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: Phillip.M.Feldman, ezio.melotti, mrabarnett
Priority: normal Keywords:

Created on 2012-03-07 17:28 by Phillip.M.Feldman, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (2)
msg155100 - (view) Author: Phillip Feldman (Phillip.M.Feldman) Date: 2012-03-07 17:28
The first example below works; the second one produces output containing garbage characters.  (This came up while I was creating a set of examples for a tutorial on regular expressions).

import re

text= "The cat ate the rat."
print("before: %s" % text)
m= re.search("The (\w+) ate the (\w+)", text)
text= "The %s ate the %s." % (m.group(2), m.group(1))
print("after : %s" % text)

text= "The cat ate the rat."
print("before: %s" % text)
text= re.sub("(\w+) ate the (\w+)", "\2 ate the \1", text)
print("after : %s" % text)
msg155104 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-03-07 17:47
You forgot to use raw strings:
>>> text = "The cat ate the rat."
>>> print("before: %s" % text)
before: The cat ate the rat.
>>> text = re.sub("(\w+) ate the (\w+)", r"\2 ate the \1", text)
>>> print("after : %s" % text)
after : The rat ate the cat.
>>> 

(Maybe you should reconsider writing yet another tutorial about regular expressions, and possibly submit patches to improve the official regex howto if you think it's not good enough.)
History
Date User Action Args
2022-04-11 14:57:27adminsetgithub: 58429
2012-03-07 17:47:43ezio.melottisetstatus: open -> closed
messages: + msg155104

assignee: ezio.melotti
resolution: not a bug
stage: resolved
2012-03-07 17:28:33Phillip.M.Feldmancreate