classification
Title: Deprecate unrecognized backslash+letter escapes in re
Type: enhancement Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: cvrebert, ezio.melotti, mrabarnett, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-03-09 13:28 by serhiy.storchaka, last changed 2015-03-24 23:35 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
re_deprecate_escaped_letters.patch serhiy.storchaka, 2015-03-09 13:28 review
Messages (3)
msg237645 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-09 13:28
Regular expressions use the backslash character for two functions:
1) to indicate special forms;
2) to allow special characters to be used without invoking their special meaning.

If backslash + character is not recognized as special form (1), it interpreted in meaning (2).

Usually new special forms have form backslash + ASCII letter, because unlike to other characters single ASCII letters do not have special meaning in any regular expression engine or programming language. This using the backslash with inner ASCII letter dangerous. Currently it means just this letter literally, but in future it can mean special form. For example \u and \U forms were added in 3.3 and this could break regular expression patters that use \u and \U before.

To avoid possible breaking it makes sense to reject unrecognized backslash + ASCII letter sequences. Proposed patch adds deprecation warnings when unknown escape of ASCII letter is used. The idea was proposed by Matthew Barnett [1].

[1] http://permalink.gmane.org/gmane.comp.python.devel/151657
msg239181 - (view) Author: Roundup Robot (python-dev) Date: 2015-03-24 20:59
New changeset 014031a4d398 by Serhiy Storchaka in branch 'default':
Issue #23622: Unknown escapes in regular expressions that consist of ``'\'``
https://hg.python.org/cpython/rev/014031a4d398
msg239197 - (view) Author: Roundup Robot (python-dev) Date: 2015-03-24 23:35
New changeset 7384db2fce8a by Serhiy Storchaka in branch 'default':
Fixed using deprecated escaping in regular expression in _strptime.py (issue23622).
https://hg.python.org/cpython/rev/7384db2fce8a
History
Date User Action Args
2015-03-24 23:35:06python-devsetmessages: + msg239197
2015-03-24 21:18:22serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2015-03-24 20:59:00python-devsetnosy: + python-dev
messages: + msg239181
2015-03-13 18:03:34serhiy.storchakasettitle: Deprecate unrecognized backslash+letter escapes -> Deprecate unrecognized backslash+letter escapes in re
2015-03-13 17:33:16cvrebertsetnosy: + cvrebert
2015-03-09 13:28:27serhiy.storchakacreate