This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Fj
Recipients Fj, docs@python
Date 2012-05-09.11:30:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1336563037.57.0.851636000616.issue14763@psf.upfronthosting.co.za>
In-reply-to
Content
string.split documentation says:

> The optional third argument maxsplit defaults to 0. If it is nonzero, at most maxsplit number of splits occur, and the remainder of the string is returned as the final element of the list (thus, the list will have at most maxsplit+1 elements).

It lies! If you give it maxsplit=0 it doesn't do any splits at all! It should say:

> The optional third argument maxsplit defaults to **-1**. If it is **nonnegative**, at most maxsplit number of splits occur, ...

Additionally, it could specify default values in the function signature explicitly, like re.split does:

    string.split(s, sep=None, maxsplit=-1)

instead of

    string.split(s, [sep, [maxsplit]])

It seems that the inconsistency stems from the time long forgotten (certainly before 2.5) when string.split used the implementation in stropmodule.c (obsolete), which does indeed uses maxsplit=0 (and on which the re.split convention was based, regrettably).

Currently string.split just calls str.split, and that uses maxsplit=-1 to mean unlimited splits.

From searching "maxsplit" in the bug tracker I understand that split functions have had a rather difficult history and some quirks preserved for the sake of backward compatibility, and not documented for the sake of brevity. In this case, however, the documentation does try to document the particular behaviour, but is wrong, which is really confusing.

Also, maybe an even better fix would be to change the str.split documentation to use the proper signature (`str.split(sep=None, maxsplit=-1)`), and simply say that string.split(s, sep=None, maxsplit=-1) calls s.split(sep, maxsplit) here? Because that's what it does, while having _two_ different, incomplete, partially wrong explanations of the same thing is confusing!
History
Date User Action Args
2012-05-09 11:30:37Fjsetrecipients: + Fj, docs@python
2012-05-09 11:30:37Fjsetmessageid: <1336563037.57.0.851636000616.issue14763@psf.upfronthosting.co.za>
2012-05-09 11:30:36Fjlinkissue14763 messages
2012-05-09 11:30:36Fjcreate