Message231116
This is old rule. \w{2,}-(?=\w{2,} -- single letter shouldn't be separated. But there was a bug in such simple regex, it splits a word after non-word character (in particular apostrophe or hyphen) if it followed by word characters and hyphen. There were attempts to fix this bug in issue596434 and issue965425 but they missed a cases when non-word character is occurred inside a word.
Originally I had assigned this issue only to 3.5 because I supposed that the solution needs either new features in re or backward-incompatible changes to word splitting algorithm. But found solution doesn't require 3.5-only features, doesn't change interface, and fixes performance and behavior bugs. So I think it should be applied to maintained releases too. |
|
Date |
User |
Action |
Args |
2014-11-13 14:43:04 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, georg.brandl, pitrou, roippi, inkerman |
2014-11-13 14:43:04 | serhiy.storchaka | set | messageid: <1415889784.57.0.183690198962.issue22687@psf.upfronthosting.co.za> |
2014-11-13 14:43:04 | serhiy.storchaka | link | issue22687 messages |
2014-11-13 14:43:03 | serhiy.storchaka | create | |
|