This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: slow matching on regular expression
Type: performance Stage: resolved
Components: Regular Expressions Versions: Python 3.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: HeRaNO, ezio.melotti, mrabarnett, serhiy.storchaka
Priority: normal Keywords:

Created on 2022-02-22 10:23 by HeRaNO, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
match.py HeRaNO, 2022-02-22 10:23
Messages (3)
msg413700 - (view) Author: Heran Yang (HeRaNO) Date: 2022-02-22 10:23
I'm using `re.fullmatch` to match a string that only contains 0 and 1. The regular expression is: (0+|1(01*0)*1)+

It runs rather slow with Python 3.7, but when I try using regex in C++, with std::regex_constants::__polynomial, it works well.

Would someone take a look at it? Thx.
msg413708 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2022-02-22 14:36
The expression is a repeated alternative where the first alternative is a repeat. Repeated repeats can result in a lot of attempts and backtracking and should be avoided.

Try this instead:

    (0|1(01*0)*1)+
msg413730 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-02-22 18:17
The re module does not support features corresponding to std::regex_constants::__polynomial in C++. Rewrite your regular expression or try to use alternative regex implementations (for example wrappers around the re2 library or C++ regex library).
History
Date User Action Args
2022-04-11 14:59:56adminsetgithub: 90981
2022-02-22 18:17:22serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg413730

resolution: wont fix
stage: resolved
2022-02-22 14:36:28mrabarnettsetmessages: + msg413708
2022-02-22 10:23:02HeRaNOcreate