classification
Title: bytes.split shold have same interface as str.split, or different name
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.0
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: doerwalter, georg.brandl, gvanrossum, nirs, pythonmeister
Priority: normal Keywords: patch

Created on 2007-09-07 01:30 by nirs, last changed 2007-09-10 16:53 by gvanrossum. This issue is now closed.

Files
File name Uploaded Description Edit
bytes-split.diff gvanrossum, 2007-09-08 17:05
Messages (10)
msg55723 - (view) Author: Nir Soffer (nirs) * Date: 2007-09-07 01:30
>>> b'foo  bar'.split()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: split() takes at least 1 argument (0 given)

>>> b'foo  bar'.split(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: expected an object with the buffer interface

str.split and bytes.split should have the same interface, or different 
names.
msg55731 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-09-07 06:55
I don't think so. They can't have the same behavior, and "split" is the most
reasonable name for what the bytes method does.

There have always been subtle differences between the behavior of string and
unicode methods; this was even more objectable because they were supposed to
be interchangeable to some degree; 3k strings and bytes are not.
msg55733 - (view) Author: Nir Soffer (nirs) * Date: 2007-09-07 14:38
Why bytes should not use a default whitespace split behavior as str?
msg55734 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2007-09-07 15:16
Because it's not clear whether b'\xa0' *is* whitespace or not. Bytes
have no meaning, characters do.
msg55737 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-09-07 17:27
I tend to agree with the author; I've run into this myself. For
whitespace, I propose to use only the following: tab LF FF VT CR space.
These are the whitespace ASCII characters according to isspace() in libc.

(Unicode also treats hex 1C, 1D, 1E and 1F as whitespace; I have no idea
what these mean. In practice I don't think it matters either way.)
msg55745 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-09-08 00:35
Here's a patch that fixes bytes.split and .rsplit.  I'll hold off for a
while in case there's strong disagreement.  I might add a patch for
bytes.strip later (it's simpler).
msg55746 - (view) Author: Stefan Sonnenberg-Carstens (pythonmeister) Date: 2007-09-08 10:25
IMHO I also aggree that strings and bytes (list of bytes) should have
the same interface.
It is common sense that talking about strings most programmers think
of a list of bytes composing it (char *).
So the abbreviation should also hold true with python.
msg55750 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-09-08 16:44
Updated patch that also modifies bytes.*strip().
msg55751 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-09-08 17:05
New version with corrected docstrings and buffer support for *split() as
well.  Added unittests.
msg55787 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-09-10 16:53
Committed revision 58093.
History
Date User Action Args
2007-09-10 16:53:53gvanrossumsetstatus: open -> closed
resolution: accepted
messages: + msg55787
2007-09-08 17:05:38gvanrossumsetfiles: - bytes-split.diff
2007-09-08 17:05:22gvanrossumsetfiles: - bytes-split.diff
2007-09-08 17:05:10gvanrossumsetfiles: + bytes-split.diff
messages: + msg55751
2007-09-08 16:44:22gvanrossumsetfiles: + bytes-split.diff
messages: + msg55750
2007-09-08 10:25:22pythonmeistersetnosy: + pythonmeister
messages: + msg55746
2007-09-08 00:35:55gvanrossumsettype: enhancement -> behavior
components: + Interpreter Core, - Library (Lib)
2007-09-08 00:35:22gvanrossumsetkeywords: + patch
files: + bytes-split.diff
messages: + msg55745
2007-09-07 17:28:16gvanrossumsetmessages: - msg55725
2007-09-07 17:27:24gvanrossumsetassignee: gvanrossum
messages: + msg55737
nosy: + gvanrossum
2007-09-07 15:16:31doerwaltersetnosy: + doerwalter
messages: + msg55734
2007-09-07 14:38:54nirssetmessages: + msg55733
2007-09-07 06:55:26georg.brandlsetnosy: + georg.brandl
messages: + msg55731
2007-09-07 02:03:43nirssettype: enhancement
messages: + msg55725
2007-09-07 01:30:28nirscreate