Author serhiy.storchaka
Recipients docs@python, ezio.melotti, mrabarnett, rhettinger, serhiy.storchaka
Date 2017-12-04.10:08:05
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1512382086.34.0.213398074469.issue32211@psf.upfronthosting.co.za>
In-reply-to
Content
>>> re.findall(r'^|\w+', 'two words')
['', 'wo', 'words']

Seems the current behavior was documented incorrectly in issue732120.

It will be fixed in 3.7 (see issue1647489, issue25054), but I hesitate to backport the fix to 3.6 and 2.7 because this can break the user code. For example:

In Python 3.6:

>>> list(re.finditer(r'(?m)^\s*?$', 'foo\n\n\nbar'))
[<_sre.SRE_Match object; span=(4, 4), match=''>, <_sre.SRE_Match object; span=(5, 5), match=''>]

In Python 3.7:

>>> list(re.finditer(r'(?m)^\s*?$', 'foo\n\n\nbar'))
[<re.Match object; span=(4, 4), match=''>, <re.Match object; span=(4, 5), match='\n'>, <re.Match object; span=(5, 5), match=''>]

(This is a real pattern used in the docstring module, but with re.sub()).

The proposed PR documents the current weird behavior in 2.7 and 3.6.
History
Date User Action Args
2017-12-04 10:08:06serhiy.storchakasetrecipients: + serhiy.storchaka, rhettinger, ezio.melotti, mrabarnett, docs@python
2017-12-04 10:08:06serhiy.storchakasetmessageid: <1512382086.34.0.213398074469.issue32211@psf.upfronthosting.co.za>
2017-12-04 10:08:06serhiy.storchakalinkissue32211 messages
2017-12-04 10:08:05serhiy.storchakacreate