Message 229770 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	georg.brandl, inkerman, serhiy.storchaka
Date	2014-10-21.18:09:13
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1413914954.1.0.737795875817.issue22687@psf.upfronthosting.co.za>
In-reply-to

Content
This particular case is related to the behavior of the wordsep_re regular expression in worst case. When text contains long sequence of words characters which is not ended by a hypen, or long sequence of non-word and non-space characters (and in some other cases), computational complexity of this regular expression matching is quadratic. This is a peculiarity of current implementation of regular expression engine. May be it is possible to rewrite the regular expression so that quadratic complexity will gone, but this is not so easy. The workaround -- use break_on_hyphens=False.

This particular case is related to the behavior of the wordsep_re regular expression in worst case. When text contains long sequence of words characters which is not ended by a hypen, or long sequence of non-word and non-space characters (and in some other cases), computational complexity of this regular expression matching is quadratic. This is a peculiarity of current implementation of regular expression engine. May be it is possible to rewrite the regular expression so that quadratic complexity will gone, but this is not so easy.

The workaround -- use break_on_hyphens=False.

History
Date	User	Action	Args
2014-10-21 18:09:14	serhiy.storchaka	set	recipients: + serhiy.storchaka, georg.brandl, inkerman
2014-10-21 18:09:14	serhiy.storchaka	set	messageid: <1413914954.1.0.737795875817.issue22687@psf.upfronthosting.co.za>
2014-10-21 18:09:14	serhiy.storchaka	link	issue22687 messages
2014-10-21 18:09:13	serhiy.storchaka	create