This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients georg.brandl, inkerman, pitrou, roippi, serhiy.storchaka
Date 2014-11-13.14:43:03
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1415889784.57.0.183690198962.issue22687@psf.upfronthosting.co.za>
In-reply-to
Content
This is old rule. \w{2,}-(?=\w{2,} -- single letter shouldn't be separated. But there was a bug in such simple regex, it splits a word after non-word character (in particular apostrophe or hyphen) if it followed by word characters and hyphen. There were attempts to fix this bug in issue596434 and issue965425 but they missed a cases when non-word character is occurred inside a word.

Originally I had assigned this issue only to 3.5 because I supposed that the solution needs either new features in re or backward-incompatible changes to word splitting algorithm. But found solution doesn't require 3.5-only features, doesn't change interface, and fixes performance and behavior bugs. So I think it should be applied to maintained releases too.
History
Date User Action Args
2014-11-13 14:43:04serhiy.storchakasetrecipients: + serhiy.storchaka, georg.brandl, pitrou, roippi, inkerman
2014-11-13 14:43:04serhiy.storchakasetmessageid: <1415889784.57.0.183690198962.issue22687@psf.upfronthosting.co.za>
2014-11-13 14:43:04serhiy.storchakalinkissue22687 messages
2014-11-13 14:43:03serhiy.storchakacreate