This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.sub does not play nice with chr(92)
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Samuel Warfield, ezio.melotti, mrabarnett, serhiy.storchaka
Priority: normal Keywords:

Created on 2018-10-26 04:36 by Samuel Warfield, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (4)
msg328509 - (view) Author: Samuel Warfield (Samuel Warfield) Date: 2018-10-26 04:36
Bug with regex substitutions. When calling the re.sub() method directly
char(92), the double backslash charecter as the replacement, throws an
exception. Whereas compiling a regex object then calling its own .sub()
method works completely fine. I did a quick look through the bug tracker 
search for similar issues and none were reported.

# Breaks
re.sub(r'\\\\', chr(92), stringy_thingy)

vs 

# Works
parser = re.compile(r'\\\\')
parser.sub(chr(92), stringy_thingy)

# Where stringy_thingy is a string that is being substituted
msg328511 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2018-10-26 05:01
I'm assuming you want to replace double backslashes with single backslashes in stringy_thing, so I defined stringy_thingy and tried both your snippets but they are both failing:
>>> stringy_thingy = r'foo\\bar\\baz'
>>> print(stringy_thingy)  # stringy_thingy contains double backslashes
foo\\bar\\baz
>>> re.sub(r'\\\\', chr(92), stringy_thingy)  # fails
Traceback (most recent call last):
  ...
  File "/usr/lib/python3.6/sre_parse.py", line 245, in __next
    self.string, len(self.string) - 1) from None
sre_constants.error: bad escape (end of pattern) at position 0
>>>
>>> parser = re.compile(r'\\\\')
>>> parser.sub(chr(92), stringy_thingy)  # also fails
Traceback (most recent call last):
  ...
  File "/usr/lib/python3.6/sre_parse.py", line 245, in __next
    self.string, len(self.string) - 1) from None
sre_constants.error: bad escape (end of pattern) at position 0

Replacing chr(92) with r'\\' works for both:
>>> print(re.sub(r'\\\\', r'\\', stringy_thingy))
foo\bar\baz
>>> print(parser.sub(r'\\', stringy_thingy))
foo\bar\baz

The docs[0] says: "repl can be a string or a function; if it is a string, any backslash escapes in it are processed."
So passing chr(92) (or '\\', which is equivalent) result in the above error ("bad escape (end of pattern)") because it's seen as an incomplete escape sequence.  Passing r'\\' seems to work as intended.

ISTM there is no bug and re.sub works as documented.  Can you provide a stringy_thingy for which the first of your snippet fails but the second succeeds?

[0]: https://docs.python.org/3/library/re.html#re.sub
msg328561 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2018-10-26 15:19
@Ezio: the value of stringy_thingy is irrelevant because it never gets that far; it fails when it tries to parse the replacement, which occurs before attempting any matching.

I can't reproduce the difference either.
msg335838 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-02-18 14:53
chr(92) is a single backspace character ('\\'). Since backspace has special meaning in the replacement string, it should be escaped.

    re.sub(r'\\\\', r'\\', stringy_thingy)
History
Date User Action Args
2022-04-11 14:59:07adminsetgithub: 79253
2019-02-18 14:53:36serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg335838

stage: test needed -> resolved
2018-10-26 15:19:12mrabarnettsetstatus: pending -> open

messages: + msg328561
2018-10-26 05:01:14ezio.melottisetstatus: open -> pending
type: behavior
messages: + msg328511

resolution: not a bug
stage: test needed
2018-10-26 04:36:16Samuel Warfieldcreate