classification
Title: Deprecate accepting unrecognized braces in regular expressions
Type: enhancement Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: ezio.melotti, jwilk, mrabarnett, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-11-18 11:12 by serhiy.storchaka, last changed 2018-12-23 10:14 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 4454 closed serhiy.storchaka, 2017-11-18 18:08
Messages (2)
msg306479 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-18 11:12
Currently `{m}`, `{m,n}`, `{m,}` and `{,n}` where m and n are non-negative decimal numbers are accepted in regular expressions as quantifiers that mean repeating the previous RE from m (0 by default) to n (infinity by default) times.

But if the opening brace '{'is not followed by one of the above patterns, it means just the literal '{'.

>>> import re
>>> re.search('(foobar){e}', 'xirefoabralfobarxie')
>>> re.search('(foobar){e}', 'foobar{e}')
<re.Match object; span=(0, 9), match='foobar{e}'>

This conflicts with the regex module which uses braces for defining the "fuzzy" matching.

>>> import regex
>>> regex.search('(foobar){e}', 'xirefoabralfobarxie')
<regex.Match object; span=(0, 6), match='xirefo', fuzzy_counts=(6, 0, 0)>
>>> regex.search('(foobar){e}', 'foobar{e}')
<regex.Match object; span=(0, 6), match='foobar'>

I don't think it is worth to add support of fuzzy matching in the re module, but for compatibility it would be better to raise an error or a warning in case of '{' not following by the one of the recognized patterns. This could also help to catch typos and errors in regular expressions, i.e. in '-{1.2}' or '-{1, 2}' instead of '-{1,2}'.

Possible variants:

1. Emit a DeprecationWarning in 3.7 (and 2.7.15 with the -3 option), raise a re.error in 3.8 or 3.9.

2. Emit a PendingDeprecationWarning in 3.7, a DeprecationWarning in 3.8, and raise a re.error in 3.9 or 3.10.

3. Emit a RuntimeWarning or SyntaxWarning in 3.7 and forever.

4. Emit a FutureWarning in 3.7, and implement the fuzzy matching or replace re with regex sometimes in future. Unlikely.
msg306491 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-18 18:14
Since this will require changing regular expressions in several places in the stdlib I have chosen emitting PendingDeprecationWarning and long deprecation period.

But I'm now not sure that this is a good idea. Non-escaped braces can be used much in a wild. It may be better to use other heuristic for recognizing the fuzzy matching and other possible extensions that use braces.
History
Date User Action Args
2018-12-23 10:14:12serhiy.storchakasetstatus: open -> closed
resolution: rejected
stage: patch review -> resolved
2017-11-24 17:51:14jwilksetnosy: + jwilk
2017-11-18 18:14:02serhiy.storchakasetmessages: + msg306491
2017-11-18 18:08:22serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request4392
2017-11-18 11:46:05serhiy.storchakasettitle: Deprecate accepting -> Deprecate accepting unrecognized braces in regular expressions
2017-11-18 11:12:27serhiy.storchakacreate