New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re module: wrong capturing groups #78475
Comments
I am experiencing and issue with the following regex when using finditer.
(I know it's not the best method of dealing with HTML, and this is a simplified version) For example:
In Python 2.7, 3.5, and 3.6 it returns
But starting with 3.7 it returns
The "text" group appears to be a copy of the previous "text" group. Some other examples:
|
➜ cpython git:(70d56fb) ✗ ./python.exe
➜ cpython git:(e69fbb6) ✗ ./python.exe
Does this have something to do with 70d56fb(bpo-25054, bpo-1647489) ? Thanks |
This bug generates wrong results silently, so I suggest mark it as release blocker for 3.7.1 |
Simplify the test-case, it seem the Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47)
>>> import re
>>> re.findall(r"(?=(<\w+>)(<\w+>)?)", "<aaa><bbb>")
[('<aaa>', '<bbb>'), ('<bbb>', '')]
Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28)
>>> import re
>>> re.findall(r"(?=(<\w+>)(<\w+>)?)", "<aaa><bbb>")
[('<aaa>', '<bbb>'), ('<bbb>', '<bbb>')] |
I tried to fix it, feel free to create a new PR if you don't want this one. PR11546 has a small question, should FYI, function Lines 340 to 352 in d4f9cf5
|
Serhiy Storchaka lost his sight. If any other core developer want to review this patch, I would like to give a detailed explanation, the logic is not very compilcated. |
Original post's bug was introduced in Python 3.7.0 When investigate the code, I found another bug about capturing groups. This bug exists since very early version. Python 3.4.4 (v3.4.4:737efcadf5a6, Dec 20 2015, 19:28:18) [MSC v.1600 32 bit (Intel)] on win32
>>> import re
>>> re.search(r"\b(?=(\t)|(x))x", "a\tx").groups()
('', 'x') Expected result: (None, 'x') Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
>>> import regex
>>> regex.search(r"\b(?=(\t)|(x))x", "a\tx").groups()
(None, 'x') |
Thank you for your PR Ma Lin! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: