classification
Title: Document the bug in re.findall() and re.finditer() in 2.7 and 3.6
Type: enhancement Stage: resolved
Components: Documentation, Regular Expressions Versions: Python 3.6, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, ezio.melotti, mrabarnett, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-12-04 10:08 by serhiy.storchaka, last changed 2018-01-04 12:08 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 4695 merged serhiy.storchaka, 2017-12-04 10:12
PR 5096 merged serhiy.storchaka, 2018-01-04 09:32
Messages (3)
msg307546 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-04 10:08
>>> re.findall(r'^|\w+', 'two words')
['', 'wo', 'words']

Seems the current behavior was documented incorrectly in issue732120.

It will be fixed in 3.7 (see issue1647489, issue25054), but I hesitate to backport the fix to 3.6 and 2.7 because this can break the user code. For example:

In Python 3.6:

>>> list(re.finditer(r'(?m)^\s*?$', 'foo\n\n\nbar'))
[<_sre.SRE_Match object; span=(4, 4), match=''>, <_sre.SRE_Match object; span=(5, 5), match=''>]

In Python 3.7:

>>> list(re.finditer(r'(?m)^\s*?$', 'foo\n\n\nbar'))
[<re.Match object; span=(4, 4), match=''>, <re.Match object; span=(4, 5), match='\n'>, <re.Match object; span=(5, 5), match=''>]

(This is a real pattern used in the docstring module, but with re.sub()).

The proposed PR documents the current weird behavior in 2.7 and 3.6.
msg309459 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-01-04 09:08
New changeset 1e6d8525f9dd3dcdc83adb93b164082c8b95d17a by Serhiy Storchaka in branch '3.6':
bpo-32211: Document the existing bug in re.findall() and re.finditer(). (#4695)
https://github.com/python/cpython/commit/1e6d8525f9dd3dcdc83adb93b164082c8b95d17a
msg309463 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-01-04 12:08
New changeset ca54740f257086393106d242644d450485180b96 by Serhiy Storchaka in branch '2.7':
[2.7] bpo-32211: Document the existing bug in re.findall() and re.finditer(). (GH-4695). (#5096)
https://github.com/python/cpython/commit/ca54740f257086393106d242644d450485180b96
History
Date User Action Args
2018-01-04 12:08:33serhiy.storchakasetmessages: + msg309463
2018-01-04 09:34:08serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-01-04 09:32:12serhiy.storchakasetpull_requests: + pull_request4965
2018-01-04 09:08:26serhiy.storchakasetmessages: + msg309459
2017-12-04 10:12:51serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request4608
2017-12-04 10:08:06serhiy.storchakacreate