Message 243640 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	swanson
Recipients	docs@python, swanson
Date	2015-05-20.04:02:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1432094562.27.0.346738352121.issue24243@psf.upfronthosting.co.za>
In-reply-to

Content
Background: ----------- Perhaps this has been addressed elsewhere, but I couldn't find it. To me, semantically, the whole idea of finding nothing, whether in something or in nothing is complete and utter nonsense. I'm a fail-quickly, fail-loudly kind of guy, and I'd prefer that any attempt to find nothing would result in an exception being raised. But for whatever reason, the following behavior has long existed: >>> "".index("") == "A".index("") == 0 True >>> "" in "" and b"" in b"" and b"" in bytearray(b"") True >>> "" in "A" and b"" in b"A" and b"" in bytearray(b"A") True The Problem: ------------ The various string types (str, bytes, bytearray) all have the following functions: count, find, rfind, index, rindex Each of these functions accepts optional parameters "start" and "end". The documentation for each says something like "Optional arguments start and end are interpreted as in slice notation." This is not the case. On the one hand: >>> "".find("") == ""[0:0].find("") == "".find("", 0, 0) == 0 True Consistent so far, however: >>> "".find("") == ""[1:0].find("") == 0 and "".find("", 1, 0) == -1 True So, you see that 'start' and 'end' are NOT in all cases interpreted the same way as slicing. This problem has been around forever, affecting both Python 3 and 2, so I don't know how many people's code you'd break if you changed the behavior to make it consistent with the docs. But if it's not going to be changed, it should at least be a well-documented "feature" of the functions with a line or two of explanation in the relevant docs: https://docs.python.org/3/library/stdtypes.html#bytes-and-bytearray-operations https://docs.python.org/3/library/stdtypes.html#string-methods The built-in types bytes, bytearray, and str also have the functions "startswith" and "endswith" that also take optional 'start' and 'end' arguments. The documentation does not specifically say (as for count, (r)find, and (r)index) that these are "interpreted as in slice notation". Instead, it says: "With optional start, test string beginning at that position. With optional end, stop comparing string at that position." That wording is equally confusing and erroneous. The natural interpretation of that would lead you to believe that, unlike slicing: "A".startswith("A",0,0) == True however it's really == False because the 'end' index is like slicing. Now, as to the interpretation of finding nothing, it's a mixed bag: For str: >>> "".startswith("",0,0) True >>> "".startswith("",1,0) True >>> "".endswith("",0,0) True >>> "".endswith("",1,0) True For bytes: (and the same for bytearray) >>> b"".startswith(b"",0,0) True >>> b"".startswith(b"",1,0) False >>> b"".endswith(b"",0,0) True >>> b"".endswith(b"",1,0) False

Background:
-----------
Perhaps this has been addressed elsewhere, but I couldn't find it.  To me, semantically, the whole idea of finding nothing, whether in something or in nothing is complete and utter nonsense.  I'm a fail-quickly, fail-loudly kind of guy, and I'd prefer that any attempt to find nothing would result in an exception being raised.

But for whatever reason, the following behavior has long existed:
>>> "".index("") == "A".index("") == 0
True
>>> "" in "" and b"" in b"" and b"" in bytearray(b"")
True
>>> "" in "A" and b"" in b"A" and b"" in bytearray(b"A")
True


The Problem:
------------
The various string types (str, bytes, bytearray) all have the following functions: count, find, rfind, index, rindex
Each of these functions accepts optional parameters "start" and "end".  The documentation for each says something like "Optional arguments start and end are interpreted as in slice notation."  This is not the case.

On the one hand:
>>> "".find("") == ""[0:0].find("") == "".find("", 0, 0) == 0
True

Consistent so far, however:
>>> "".find("") == ""[1:0].find("") == 0 and "".find("", 1, 0) == -1
True

So, you see that 'start' and 'end' are NOT in all cases interpreted the same way as slicing.  This problem has been around forever, affecting both Python 3 and 2, so I don't know how many people's code you'd break if you changed the behavior to make it consistent with the docs.  But if it's not going to be changed, it should at least be a well-documented "feature" of the functions with a line or two of explanation in the relevant docs:
https://docs.python.org/3/library/stdtypes.html#bytes-and-bytearray-operations
https://docs.python.org/3/library/stdtypes.html#string-methods

The built-in types bytes, bytearray, and str also have the functions "startswith" and "endswith" that also take optional 'start' and 'end' arguments.  The documentation does not specifically say (as for count, (r)find, and (r)index) that these are "interpreted as in slice notation".  Instead, it says: "With optional start, test string beginning at that position. With optional end, stop comparing string at that position."  That wording is equally confusing and erroneous.  The natural interpretation of that would lead you to believe that, unlike slicing:
"A".startswith("A",0,0) == True
however it's really == False because the 'end' index is like slicing.

Now, as to the interpretation of finding nothing, it's a mixed bag:

For str:
>>> "".startswith("",0,0)
True
>>> "".startswith("",1,0)
True
>>> "".endswith("",0,0)
True
>>> "".endswith("",1,0)
True

For bytes: (and the same for bytearray)
>>> b"".startswith(b"",0,0)
True
>>> b"".startswith(b"",1,0)
False
>>> b"".endswith(b"",0,0)
True
>>> b"".endswith(b"",1,0)
False

History
Date	User	Action	Args
2015-05-20 04:02:42	swanson	set	recipients: + swanson, docs@python
2015-05-20 04:02:42	swanson	set	messageid: <1432094562.27.0.346738352121.issue24243@psf.upfronthosting.co.za>
2015-05-20 04:02:42	swanson	link	issue24243 messages
2015-05-20 04:02:41	swanson	create