classification
Title: Strings that end with properly escaped backslashes cause error to be thrown in re.search/sub/etc. functions.
Type: Stage: resolved
Components: Regular Expressions Versions: Python 3.7, Python 3.6, Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder: Improve re documentation
View: 31714
Assigned To: Nosy List: Patrick Foley, ezio.melotti, mrabarnett, r.david.murray, serhiy.storchaka
Priority: normal Keywords:

Created on 2017-04-21 21:18 by Patrick Foley, last changed 2017-11-16 13:25 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
sample.py Patrick Foley, 2017-04-21 21:18
Messages (7)
msg292079 - (view) Author: Patrick Foley (Patrick Foley) * Date: 2017-04-21 21:18
The following code demonstrates:

import re
text = 'ab\\'
exp = re.compile('a')
print(re.sub(exp, text, ''))

If you remove the backslash(es), the code runs fine.

This appears to be specific to the re module and only to strings that end in (even properly escaped) backslashes. You could easily receive raw data like this from freehand input sources so it would be nice not to have to remove trailing backslashes before running a regular expression.
msg292080 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-04-21 21:51
I think you are missing a re.escape around text.  Text is otherwise not a valid replacement pattern.
msg292093 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2017-04-22 01:17
Yes, the second argument is a replacement template, not a literal.

This issue does point out a different problem, though: re.escape will add backslashes that will then be treated as literals in the template, for example:

>>> re.sub(r'a', re.escape('(A)'), 'a')
'\\(A\\)'

re.escape doesn't always help.

The solution here is to pass a replacement function instead:

>>> re.sub(r'a', lambda m: '(A)', 'a')
'(A)'
msg292094 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-04-22 02:18
Good point, re.escape is for literal text you want to insert into a matching pattern, but the replacement template isn't a matching pattern.  Do we need a different escape function?  I guess the function solution is enough?
msg292095 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2017-04-22 02:54
The function solution does have a larger overhead than a literal.

Could the template be made more accepting of backslashes without breaking anything? (There's also issue29995 "re.escape() escapes too much", which might help.)
msg292102 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-22 04:49
re.escape() shouldn't be used for a replacement template. You need just double backslashes when escape a literal string for a replacement template: s.replace('\\', '\\\\'). This should be documented if still is not documented.
msg306358 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-16 13:25
The proper way of escaping the replacement string has been documented by issue31714.
History
Date User Action Args
2017-11-16 13:25:52serhiy.storchakasetstatus: open -> closed
superseder: Improve re documentation
messages: + msg306358

resolution: out of date
stage: resolved
2017-04-29 02:44:49terry.reedysetversions: + Python 3.6, Python 3.7, - Python 3.4
2017-04-22 04:49:22serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg292102
2017-04-22 02:54:11mrabarnettsetmessages: + msg292095
2017-04-22 02:18:21r.david.murraysetmessages: + msg292094
2017-04-22 01:17:58mrabarnettsetmessages: + msg292093
2017-04-21 21:51:01r.david.murraysetnosy: + r.david.murray
messages: + msg292080
2017-04-21 21:18:05Patrick Foleycreate