Author serhiy.storchaka
Recipients Alcolo Alcolo, ezio.melotti, mrabarnett, r.david.murray, serhiy.storchaka
Date 2017-11-20.00:04:13
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1511136253.44.0.213398074469.issue25054@psf.upfronthosting.co.za>
In-reply-to
Content
PR 4471 fixes this issue, issue1647489, and a couple of similar issues.

The most visible change is the change in re.split(). This is compatibility breaking change, and it affects third-party code. But ValueError or FutureWarning were raised for patterns that will change the behavior in this PR for two Python releases, since Python 3.5. Developers had enough time for fixing them. In most cases this is so trivial as changing `*` to `+` in `\s*`.

Changes in sub(), findall(), and finditer() are less visible. No one existing test needs modification for them. Was:

>>> re.split(r"\b|:+", "a::bc")
/usr/lib/python3.6/re.py:212: FutureWarning: split() requires a non-empty pattern match.
  return _compile(pattern, flags).split(string, maxsplit)
['a:', 'bc']
>>> re.sub(r"\b|:+", "-", "a::bc")
'-a-:-bc-'
>>> re.findall(r"\b|:+", "a::bc")
['', '', ':', '', '']
>>> list(re.finditer(r"\b|:+", "a::bc"))
[<_sre.SRE_Match object; span=(0, 0), match=''>, <_sre.SRE_Match object; span=(1, 1), match=''>, <_sre.SRE_Match object; span=(2, 3), match=':'>, <_sre.SRE_Match object; span=(3, 3), match=''>, <_sre.SRE_Match object; span=(5, 5), match=''>]

Fixed:

>>> re.split(r"\b|:+", "a::bc")
['', 'a', '', 'bc', '']
>>> re.sub(r"\b|:+", "-", "a::bc")
'-a--bc-'
>>> re.findall(r"\b|:+", "a::bc")
['', '', '::', '', '']
>>> list(re.finditer(r"\b|:+", "a::bc"))
[<re.Match object; span=(0, 0), match=''>, <re.Match object; span=(1, 1), match=''>, <re.Match object; span=(1, 3), match='::'>, <re.Match object; span=(3, 3), match=''>, <re.Match object; span=(5, 5), match=''>]

The behavior of re.split(), re.findall() and re.finditer() now is the same as in the regex module with the V1 flag. But the behavior of re.sub() left closer to the previous behavior, otherwise this would break existing tests. It is consistent with re.split() rather of re.findall() and re.finditer(). In regex with the V1 flag sub() is consistent with findall() and finditer(), but not with split().
History
Date User Action Args
2017-11-20 00:04:13serhiy.storchakasetrecipients: + serhiy.storchaka, ezio.melotti, mrabarnett, r.david.murray, Alcolo Alcolo
2017-11-20 00:04:13serhiy.storchakasetmessageid: <1511136253.44.0.213398074469.issue25054@psf.upfronthosting.co.za>
2017-11-20 00:04:13serhiy.storchakalinkissue25054 messages
2017-11-20 00:04:13serhiy.storchakacreate