Title: re.escape does not correctly escape newlines
Type: enhancement Stage: resolved
Components: Regular Expressions Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: MartinAltmayer, ezio.melotti, mrabarnett, serhiy.storchaka
Priority: normal Keywords:

Created on 2020-12-17 12:00 by MartinAltmayer, last changed 2020-12-17 14:41 by serhiy.storchaka. This issue is now closed.

File name Uploaded Description Edit MartinAltmayer, 2020-12-17 12:00 simple test file
Messages (4)
msg383237 - (view) Author: Martin Altmayer (MartinAltmayer) * Date: 2020-12-17 12:00
re.escape('\n') returns '\\\n', i.e. a string consisting of a backslash and a newline. I believe it should return '\\n', i.e. a backslash and an 'n'. If the escape-result still contains a verbatim newline, why escape this character at all?

Note that Python's regular expressions engine allows newlines, so re.match(re.escape('\n'), '\n') gives a match. Thus, while this looks like an undesired behavior, it is not functionally broken.

The same problem applies to some other characters: \t\r\v\f
msg383239 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2020-12-17 12:55
In a regex, putting a backslash before any character that's not an ASCII-range letter or digit makes it a literal. re.escape doesn't special-case control characters. Its purpose is to make a string that might contain metacharacters into one that's a literal, and it already does that, although it sometimes escapes more than necessary.
msg383240 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-12-17 13:02
> If the escape-result still contains a verbatim newline, why escape this character at all?

Because in verbose mode it is ignored. This is why \n and other ASCII whitespace characters, and '#', which starts a comment, should be escaped.

'\\\n' and '\\n' has identical meaning independent of mode and context, so there is no bug here. Escaping '\n' as '\\\n' is slightly simpler.
msg383247 - (view) Author: Martin Altmayer (MartinAltmayer) * Date: 2020-12-17 14:01
Thanks for the explanation, I did not know re.VERBOSE. I still think the behavior is a bit confusing, but it's probably not worth the effort to change this.
Date User Action Args
2020-12-17 14:41:45serhiy.storchakasetstatus: open -> closed
resolution: not a bug
stage: resolved
2020-12-17 14:01:31MartinAltmayersettype: behavior -> enhancement
messages: + msg383247
2020-12-17 13:02:31serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg383240
2020-12-17 12:55:57mrabarnettsetmessages: + msg383239
2020-12-17 12:00:52MartinAltmayercreate