Author mrabarnett
Recipients akuchling, amaury.forgeotdarc, georg.brandl, jimjjewett, mark, mrabarnett, pitrou, rsc, timehorse
Date 2008-09-30.23:42:25
SpamBayes Score 8.38377e-09
Marked as misclassified No
Message-id <1222818151.68.0.902851536744.issue2636@psf.upfronthosting.co.za>
In-reply-to
Content
The explanation of the zero-width bug is incorrect. What happens is this:

The functions for finditer(), findall(), etc, perform searches and want
the next one to continue from where the previous match ended. However,
if the match was actually zero-width then that would've made it search
from where the previous search _started_, and it would be stuck forever.
Therefore, after a zero-width match the caller of the search consumes a
character. Unfortunately, that can result a character being 'missed'.

The bug in re.split() is also the result of an incorrect fix to this
zero-width problem.

I suggest that the regex code should include the fix for the zero-width
split bug; we can have code to turn it off unless a re.ZEROWIDTH flag is
present, if that's the decision.

The patch issue2636+01+09-02+17+18+19+20+21+24+26_speedup.diff includes
some speedups.
History
Date User Action Args
2008-09-30 23:42:32mrabarnettsetrecipients: + mrabarnett, akuchling, georg.brandl, jimjjewett, amaury.forgeotdarc, pitrou, rsc, timehorse, mark
2008-09-30 23:42:31mrabarnettsetmessageid: <1222818151.68.0.902851536744.issue2636@psf.upfronthosting.co.za>
2008-09-30 23:42:30mrabarnettlinkissue2636 messages
2008-09-30 23:42:29mrabarnettcreate