Date 2004-07-28
I picked through CVS, python-dev and google and came up with
this.  The current behavior was present way back in the
earliest in CVS (dated Sep 1992); subsequent
implementation seem to be mirroring this behavior.  The CVS
comment back in 1992 described split as modeled on nawk.  A
check of nawk(1) confirms that nawk only splits on non-null
matches.  Perl (circa 5.6) on the other hand, appears to
split the way this patch does (though I wasn't aware of that
when I wrote the patch) so that might argue in the other
direction.  I would note, too, that re.findall and
re.finditer tend in this direction ("Empty matches are
included in the result unless they touch the beginning of
another match.").

The python-dev archive doesn't seem to go back far enough to
be relevant and I'm not sure how to search it.  General
googling (python "re.split" empty match) found a few hits. 
Probably the most relevant is Tim Peters saying "Python
won't change here (IMO)" and giving the example that he also
gives in a comment to bug #852532 (which this patch
addresses).  He also wonders in his comment about the
possibility of a "design constraint", but I think this patch
addresses that concern.

As far as I can tell, the current behavior was a design
decision made over 10 years ago, between two alternatives
that probably didn't matter much at the time.  Skipping
empty matches probably seemed harmless before
lookahead/lookbehind assertions.  Now, though, the current
behavior seems like a significant hindrance.  Furthermore,
it ought to be pretty trivial to modify any existing
patterns to get the old behavior, should that be desired
(e.g., use 'x+' instead of 'x*').

(I didn't notice that re.findall doc when I originally wrote
this patch.  Perhaps the doc in the patch should be slightly
modified to help emphasize the similarity between how
re.findall and re.split handle empty matches.)
