classification
Title: re.split() should behave like string.split() for maxsplit=0 and maxsplit=-1
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: acg, ezio.melotti, rhettinger, terry.reedy
Priority: normal Keywords:

Created on 2011-11-05 02:46 by acg, last changed 2011-12-28 05:15 by rhettinger. This issue is now closed.

Messages (5)
msg147066 - (view) Author: Alan Grow (acg) Date: 2011-11-05 02:46
If you split a string in a maximum of zero places, you should get the original string back. "".split(s,0) behaves this way. But re.split(r,s,0) performs an unlimited number of splits in this case.

To get an unlimited number of splits, "".split(s,-1) is a sensible choice. But in this case re.split(r,s,-1) performs zero splits.

Where's the sense in this?

>>> import string, re
>>> string.split("foo bar baz"," ",0)
['foo bar baz']
>>> re.split("\s+","foo bar baz",0)
['foo', 'bar', 'baz']
>>> string.split("foo bar baz"," ",-1)
['foo', 'bar', 'baz']
>>> re.split("\s+","foo bar baz",-1)
['foo bar baz']
msg147067 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-11-05 03:03
This is a known issue, but I don't think it can be fixed without breaking backward compatibility.  The behavior with negative values is not explicitly documented, so I would consider it an implementation detail. The behavior with positive values is documented for both the functions.  Also even if it's inconsistent, I would expect people to request at least 1 split, otherwise they are basically asking for a no-op.
I suggest to close this as wontfix
msg147542 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-11-13 03:04
The two methods are defined differently, and act as defined, so this is a feature request, not a bug report.

str.split([sep[, maxsplit]]) 
... If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified, then there is no limit on the number of splits (all possible splits are made).

re.split(pattern, string, maxsplit=0, flags=0)
...If maxsplit is nonzero, at most maxsplit splits occur,

Clearly, if maxsplit for re.split is the default of 0, it must do all splits. There is a difference between being optional with no default (possible with C-coded functions) and with a default.

Logically, both should have a default of None, meaning no limit. But I agree with Ezio and do not see that happening for Python 3.

As for negative values, I would have maxsplit treated as a count and make negative values a ValueError.
msg147545 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-11-13 03:08
Terry, thanks for closing this.  The API for str.split() has been set in stone for a very long time.
msg150281 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-12-28 05:15
I concur with closing this one.
History
Date User Action Args
2011-12-28 05:15:57rhettingersetmessages: + msg150281
2011-11-13 03:08:59rhettingersetmessages: + msg147545
2011-11-13 03:04:51terry.reedysetstatus: open -> closed
versions: - Python 2.7, Python 3.2
nosy: + terry.reedy

messages: + msg147542

type: behavior -> enhancement
2011-11-05 03:03:22ezio.melottisetversions: + Python 2.7, Python 3.2, Python 3.3, - Python 2.6
nosy: + rhettinger, ezio.melotti

messages: + msg147067

resolution: wont fix
2011-11-05 02:46:35acgcreate