Message176459
There's actually enormous backtracking here. Try this much shorter regexp and you'll see much the same behavior:
re_utf8 = r'^([\x00-\x7f]+)*$'
That's the original re_utf8 with all but the first alternative removed.
Looks like passing s[0:34] "works" because it eliminates the trailing \x8d that prevents the regexp from matching the whole string. Because the regexp cannot match the whole string, it takes a very long time to try all the futile combinations implied by the nested quantifiers. As the much simpler re_utf8 above shows, it's not the alternatives in the regexp that matter here, it's the nested quantifiers. |
|
Date |
User |
Action |
Args |
2012-11-27 00:42:18 | tim.peters | set | recipients:
+ tim.peters, lpd, ezio.melotti, mrabarnett |
2012-11-27 00:42:18 | tim.peters | set | messageid: <1353976938.17.0.842131609007.issue16563@psf.upfronthosting.co.za> |
2012-11-27 00:42:18 | tim.peters | link | issue16563 messages |
2012-11-27 00:42:17 | tim.peters | create | |
|