This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gdr@garethrees.org
Recipients Walter Farrell, ezio.melotti, gdr@garethrees.org, mrabarnett
Date 2016-11-14.20:55:24
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1479156925.14.0.601715521823.issue28690@psf.upfronthosting.co.za>
In-reply-to
Content
This is a well-known gotcha with backtracking regexp implementations. The problem is that in the alternation "( +|'[^']*'|\"[^\"]*\"|[^>]+)" there are some characters (space, apostrophe, double quotes) that match multiple alternatives (for example a space matches both " +" and "[^>]+"). This causes the regexp engine to have to backtrack for each ambiguous character to try out the other alternatives, leading to runtime that's exponential in the number of ambiguous characters.

Linear behaviour can be restored if you make the alternation unambiguous, like this: ( +|'[^']*'|\"[^\"]*\"|[^>'\"]+)
History
Date User Action Args
2016-11-14 20:55:25gdr@garethrees.orgsetrecipients: + gdr@garethrees.org, ezio.melotti, mrabarnett, Walter Farrell
2016-11-14 20:55:25gdr@garethrees.orgsetmessageid: <1479156925.14.0.601715521823.issue28690@psf.upfronthosting.co.za>
2016-11-14 20:55:25gdr@garethrees.orglinkissue28690 messages
2016-11-14 20:55:24gdr@garethrees.orgcreate