Title: bytes.split shold have same interface as str.split, or different name
msg55723 - (view) Author: Nir Soffer (nirs) * Date: 2007-09-07 01:30
>>> b'foo  bar'.split()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: split() takes at least 1 argument (0 given)

>>> b'foo  bar'.split(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: expected an object with the buffer interface

str.split and bytes.split should have the same interface, or different 
msg55731 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-09-07 06:55
I don't think so. They can't have the same behavior, and "split" is the most
reasonable name for what the bytes method does.

There have always been subtle differences between the behavior of string and
unicode methods; this was even more objectable because they were supposed to
be interchangeable to some degree; 3k strings and bytes are not.
msg55733 - (view) Author: Nir Soffer (nirs) * Date: 2007-09-07 14:38
Why bytes should not use a default whitespace split behavior as str?
msg55734 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2007-09-07 15:16
Because it's not clear whether b'\xa0' *is* whitespace or not. Bytes
have no meaning, characters do.
msg55737 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-09-07 17:27
I tend to agree with the author; I've run into this myself. For
whitespace, I propose to use only the following: tab LF FF VT CR space.
These are the whitespace ASCII characters according to isspace() in libc.

(Unicode also treats hex 1C, 1D, 1E and 1F as whitespace; I have no idea
what these mean. In practice I don't think it matters either way.)
msg55745 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-09-08 00:35
Here's a patch that fixes bytes.split and .rsplit.  I'll hold off for a
while in case there's strong disagreement.  I might add a patch for
bytes.strip later (it's simpler).
msg55746 - (view) Author: Stefan Sonnenberg-Carstens (pythonmeister) Date: 2007-09-08 10:25
IMHO I also aggree that strings and bytes (list of bytes) should have
the same interface.
It is common sense that talking about strings most programmers think
of a list of bytes composing it (char *).
So the abbreviation should also hold true with python.
msg55750 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-09-08 16:44
Updated patch that also modifies bytes.*strip().
msg55751 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-09-08 17:05
New version with corrected docstrings and buffer support for *split() as
well.  Added unittests.
msg55787 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-09-10 16:53
Committed revision 58093.
