This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re module treats raw strings as normal strings
Type: behavior Stage:
Components: Library (Lib), Regular Expressions Versions: Python 2.4, Python 2.6, Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: akuchling Nosy List: akuchling, ezio.melotti, georg.brandl, gvanrossum, loewis
Priority: normal Keywords:

Created on 2008-10-23 03:55 by ezio.melotti, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
raw-strings-with-re.txt ezio.melotti, 2008-10-23 03:55 Interactive Python session with more examples
Messages (8)
msg75133 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2008-10-23 03:55
The re module seems to treat the raw strings as normal strings:
>>> 'a1a1a'.replace('1', r'\n') == re.sub('1', r'\n', 'a1a1a')
False
>>> 'a1a1a'.replace('1', '\n') == re.sub('1', r'\n', 'a1a1a')
True
In the first line str.replace and re.sub should perform exactly the same
operation and return the same result but re.sub replaces the 1s with
newlines, instead of a literal '\' and 'n'.
The second line should evaluate to False but re.sub replaces again the
1s with newlines so the result is equal to the LHS.

>>> re.search(r'\n', 'a\na')
<_sre.SRE_Match object at 0x00A81BF0>
>>> r'\n' in 'a\na'
False
Searching a r'\n' in a string that contains a newline also return a
result even if r'\n' is not in 'a\na'.

Tested with Py2.5 on Linux and Py2.4/2.6 on win.
The problem could be related to http://bugs.python.org/msg71861 
Attached there is a txt file with more examples.
msg75134 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-10-23 04:02
No, re.sub()'s documentation
(http://docs.python.org/library/re.html#re.sub)
makes it clear that \ followed by n in the replacement string is
interpreted.

To insert \ followed by n you have to double the \ inside the raw string
like this:

>>> re.sub('a', r'\\n', 'abba')
'\\nbb\\n'
>>>
msg75135 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2008-10-23 04:30
My bad, I only checked with help(re.sub).
In the examples with re.search I was indeed wrong because I forgot to
escape the \ and for the regex engine \n is the same of n (whereas \\n
is a literal \ followed by n), but I expected 'a1a1a'.replace('1',
r'\n') to return the same of re.sub('1', r'\n', 'a1a1a') because the
r'\n' is not a regex but a simple replacement string.
Also, the doc says "repl can be a string or a function; if it is a
string, any backslash escapes in it are processed. That is, \n is
converted to a single newline character, \r is converted to a linefeed,
and so forth.", this is the standard behavior of normal string, it
should be mentioned that the backslashes are processed even if with raw
strings and they need to be escaped with two \.
I think that changing the behavior of what is supposed to be a "normal
string" (the repl string) is not really a good idea (even if it's useful
when you have things like '\1\n' and assuming that this is why it has
different behavior), I'd rather prefer to use $1 instead of \1. Unlike
'\1', (as far as I know) $1 has no special meaning in Python so there
won't be any problem with raw strings.
msg75760 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2008-11-11 20:46
Re-opening as a documentation bug; we should at least make the re.sub
docstring match the text documentation.
msg77502 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-10 08:40
IIUC, there is no proposed patch yet, so this is out of scope for 2.5.3.
msg77562 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-12-10 18:04
Eh? It's just a doc bug now.
msg77575 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-10 22:45
> Eh? It's just a doc bug now.

[assuming you are wondering why it is out of scope for 2.5.3]
I don't understand the actual issue (and don't have time to find out
what it is), so somebody else would have to provide a patch. Since there
is no patch, this issue likely misses 2.5.3 (if it is just an error
in a doc string, I don't find it particularly necessary to fix it
in the bug fix release, either).
msg78699 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009-01-01 12:00
Added a bit to the re.sub(n) docstrings in r68118.
History
Date User Action Args
2022-04-11 14:56:40adminsetgithub: 48435
2009-01-01 12:00:35georg.brandlsetstatus: open -> closed
nosy: + georg.brandl
resolution: fixed
messages: + msg78699
2008-12-10 22:45:14loewissetmessages: + msg77575
2008-12-10 18:04:09gvanrossumsetmessages: + msg77562
2008-12-10 08:40:49loewissetnosy: + loewis
messages: + msg77502
versions: - Python 2.5.3
2008-11-11 20:46:45akuchlingsetstatus: closed -> open
assignee: akuchling
messages: + msg75760
resolution: not a bug -> (no value)
2008-11-11 20:07:41akuchlingsetnosy: + akuchling
2008-10-23 04:30:57ezio.melottisetmessages: + msg75135
2008-10-23 04:02:18gvanrossumsetstatus: open -> closed
nosy: + gvanrossum
messages: + msg75134
resolution: not a bug
2008-10-23 03:55:28ezio.melotticreate