Message 129262 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	nikomatsakis
Recipients	nikomatsakis
Date	2011-02-24.12:57:46
SpamBayes Score	2.346941e-10
Marked as misclassified	No
Message-id	<1298552267.25.0.174570607151.issue11307@psf.upfronthosting.co.za>
In-reply-to

Content
Executing code like this: >>> r = re.compile(r'(\w+)=.') >>> r.match("abcdefghijklmnopqrstuvwxyz") takes a long time (around 12 seconds, on my machine). Presumably this is because it is enumerating all the various ways to divvy up the alphabet for (\w+), even though there is no "=" sign to be found. In contrast, in perl a regular expression like that seems to run instantly. This could be optimized by recognizing that no "=" sign was found, and thus it does not matter how the first part of the regular expression matches, so there is no need to try additional possibilities. To some extent, of course, the answer is just "don't write regular expressions like that." This example is reduced down from a real regexp where the potential inefficiency was less obvious. Nonetheless the general optimization of recognizing when further re-enumeration is not necessary makes sense more generally. In any case, I am submitting the bug report merely to raise the issue as a possible future optimization, not to suggest that it must be addressed immediately (or even at all).

Executing code like this:

>>> r = re.compile(r'(\w+)*=.*')
>>> r.match("abcdefghijklmnopqrstuvwxyz")

takes a long time (around 12 seconds, on my machine).  Presumably this is because it is enumerating all the various ways to divvy up the alphabet for (\w+), even though there is no "=" sign to be found.  In contrast, in perl a regular expression like that seems to run instantly.

This could be optimized by recognizing that no "=" sign was found, and thus it does not matter how the first part of the regular expression matches, so there is no need to try additional possibilities.  To some extent, of course, the answer is just "don't write regular expressions like that."  This example is reduced down from a real regexp where the potential inefficiency was less obvious.  Nonetheless the general optimization of recognizing when further re-enumeration is not necessary makes sense more generally.

In any case, I am submitting the bug report merely to raise the issue as a possible future optimization, not to suggest that it must be addressed immediately (or even at all).

History
Date	User	Action	Args
2011-02-24 12:57:47	nikomatsakis	set	recipients: + nikomatsakis
2011-02-24 12:57:47	nikomatsakis	set	messageid: <1298552267.25.0.174570607151.issue11307@psf.upfronthosting.co.za>
2011-02-24 12:57:46	nikomatsakis	link	issue11307 messages
2011-02-24 12:57:46	nikomatsakis	create