Message46355
Logged In: YES
user_id=555
I picked through CVS, python-dev and google and came up with
this. The current behavior was present way back in the
earliest regsub.py in CVS (dated Sep 1992); subsequent
implementation seem to be mirroring this behavior. The CVS
comment back in 1992 described split as modeled on nawk. A
check of nawk(1) confirms that nawk only splits on non-null
matches. Perl (circa 5.6) on the other hand, appears to
split the way this patch does (though I wasn't aware of that
when I wrote the patch) so that might argue in the other
direction. I would note, too, that re.findall and
re.finditer tend in this direction ("Empty matches are
included in the result unless they touch the beginning of
another match.").
The python-dev archive doesn't seem to go back far enough to
be relevant and I'm not sure how to search it. General
googling (python "re.split" empty match) found a few hits.
Probably the most relevant is Tim Peters saying "Python
won't change here (IMO)" and giving the example that he also
gives in a comment to bug #852532 (which this patch
addresses). He also wonders in his comment about the
possibility of a "design constraint", but I think this patch
addresses that concern.
As far as I can tell, the current behavior was a design
decision made over 10 years ago, between two alternatives
that probably didn't matter much at the time. Skipping
empty matches probably seemed harmless before
lookahead/lookbehind assertions. Now, though, the current
behavior seems like a significant hindrance. Furthermore,
it ought to be pretty trivial to modify any existing
patterns to get the old behavior, should that be desired
(e.g., use 'x+' instead of 'x*').
(I didn't notice that re.findall doc when I originally wrote
this patch. Perhaps the doc in the patch should be slightly
modified to help emphasize the similarity between how
re.findall and re.split handle empty matches.) |
|
Date |
User |
Action |
Args |
2007-08-23 15:38:35 | admin | link | issue988761 messages |
2007-08-23 15:38:35 | admin | create | |
|