This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients georg.brandl, inkerman, serhiy.storchaka
Date 2014-10-21.18:09:13
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1413914954.1.0.737795875817.issue22687@psf.upfronthosting.co.za>
In-reply-to
Content
This particular case is related to the behavior of the wordsep_re regular expression in worst case. When text contains long sequence of words characters which is not ended by a hypen, or long sequence of non-word and non-space characters (and in some other cases), computational complexity of this regular expression matching is quadratic. This is a peculiarity of current implementation of regular expression engine. May be it is possible to rewrite the regular expression so that quadratic complexity will gone, but this is not so easy.

The workaround -- use break_on_hyphens=False.
History
Date User Action Args
2014-10-21 18:09:14serhiy.storchakasetrecipients: + serhiy.storchaka, georg.brandl, inkerman
2014-10-21 18:09:14serhiy.storchakasetmessageid: <1413914954.1.0.737795875817.issue22687@psf.upfronthosting.co.za>
2014-10-21 18:09:14serhiy.storchakalinkissue22687 messages
2014-10-21 18:09:13serhiy.storchakacreate