classification
Title: poor documentation for .startswith, .endswith
Type: Stage:
Components: Documentation Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: aldwinaldwin, docs@python, holdenweb, steven.daprano, v+python
Priority: normal Keywords:

Created on 2019-07-03 05:01 by v+python, last changed 2020-09-19 18:03 by iritkatriel.

Messages (9)
msg347179 - (view) Author: Glenn Linderman (v+python) * Date: 2019-07-03 05:01
The documentation is reasonably clear regarding the first parameter, which can be a string or a tuple of strings to match at the start or end of a string.

However, the other two parameters are much less clear in their effect.

text = "Now the day is over"
text.startswith('the', 2, 8)

Does it produce True because 'w the' is at the beginning of the text[2:] ? Maybe. Or because there is an ending position, must it fail because it doesn't match all of text[2:8] ?

text.startswith('day', 8, 10)

Does this produce True because everything in day matches text[8:10] or must it always produce false for any value of text because the match is never as long as the prefix string?

text.startswith(('day', 'month', 'year'), 8, 12)

Can this ever match day or month, because it is comparing to text[8:12], or can only year match because of the start and end?

Is there a difference between the following:

text.startswith(('day', 'month', 'year'), 8, 12)
text[8:12].startswith(('day', 'month', 'year'))

If no difference, why does startswith even need the extra two parameters? Maybe only in performance?

If no difference, why doesn't the documentation describe it that way, so that it could be far clearer?

If there is a difference, what is the difference?

Similar questions for endswith.
msg347180 - (view) Author: Glenn Linderman (v+python) * Date: 2019-07-03 05:06
Or is 

text.startswith(('day', 'month', 'year'), 8, 12)

the same as

text[8:12] in ('day', 'month', 'year')


What happens if the text doesn't have as many as 12 characters? What if it doesn't have more than 8 characters?
msg347185 - (view) Author: Aldwin Pollefeyt (aldwinaldwin) * Date: 2019-07-03 07:30
* text.startswith(prefix, start, end) seems the same as text[start:end].startswith(prefix)
* text[start:end]  with end>len(text) seems no issue, so also not an issue for startswith
* text[8:12] in ('day', 'month', 'year') is not the same at all, rather:
For x in ('day', 'month', 'year'):
    if text[8:12].startswith(x): return True
Else: return False

Maybe indeed could add to the documentation that 'text.startswith(prefix, start, end)' is the same as 'text[start:end].startswith(prefix)'? Although seemed obvious for me.
msg347187 - (view) Author: Steve Holden (holdenweb) * (Python committer) Date: 2019-07-03 07:41
"Is the same as" is a little misleading - "gives the same result as" would be better, since there is little doubt actually slicing the subject strings would be massively less efficient in looping contexts.

The re module offers the start and end arguments to so many functions/methods for precisely this reason, so perhaps that module's documentation will contain helpful wording that could  be copied or referenced.
msg347188 - (view) Author: Aldwin Pollefeyt (aldwinaldwin) * Date: 2019-07-03 07:58
Modified from re module Pattern.search:
--------
The optional second parameter 'start' gives an index in the string where the search is to start; it defaults to 0.

The optional parameter 'end' limits how far the string will be searched; it will be as if the string is 'end' characters long, so only the characters from 'start' to 'end' - 1 will be searched for a match. If 'end' is less than 'start', no match will be found; otherwise, text.startswith(prefix, start, end) gives the same result as text[start:end](prefix).
------------

I don't think this is true like with re:
----
This is not completely equivalent to slicing the string; the '^' pattern character matches at the real beginning of the string and at positions just after a newline, but not necessarily at the index where the search is to start.
----
msg347189 - (view) Author: Aldwin Pollefeyt (aldwinaldwin) * Date: 2019-07-03 08:00
correction:

... otherwise, text.startswith(prefix, start, end) gives the same result as text[start:end].startswith(prefix).
msg347194 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-07-03 09:13
Here are links to the relevant docs:

https://docs.python.org/3/library/stdtypes.html#str.startswith
https://docs.python.org/3/library/stdtypes.html#str.endswith

Both say:

"With optional *start*, test string beginning at that position. With optional *end*, stop comparing string at that position."

That seems perfectly clear to me: if you pass a starting position, the *starts with* test (or ends with) considers the substring starting at that position. If you pass an ending position, the starts with test considers the substring ending at that position.

That makes it equivalent to text[start:end].startswith(prefix) except that no copying is done.

It seems that you are reading the start and end positions as somehow requiring that the prefix be an exact match for the given slice, e.g. when you ask:

> must it fail because it doesn't match all of text[2:8] ?

and later:

> can only year match because of the start and end?

These questions imply you think that the methods require the specified slice *equals* the given affix (prefix/suffix) rather than *start* or *end* with. That seems to me to be an unjustified interpretation of what the docs say.

In the absence of any evidence to the contrary, we are surely entitled to assume that the *startswith* method remains *startswith* regardless of whether a slice (start/end positions) is specified or not. Or to put it another way, it goes without saying that specifying a slice (start and/or end positions) doesn't change the semantics of the method, it only changes the starting and ending positions, precisely as already documented.


Glenn asked:

> text = "Now the day is over"
> text.startswith('the', 2, 8)
> Does it produce True because 'w the' is at the beginning of the text[2:] ?

No, it produces False, because text[2:8] does not start with "the", it starts with "w".

> Maybe. Or because there is an ending position, must it fail because it 
> doesn't match all of text[2:8] ?

If fails, but that's not why it fails. If fails because the substring doesn't start with the prefix, not because it doesn't equal the prefix.

> text.startswith('day', 8, 10)
> Does this produce True because everything in day matches text[8:10] 

No, it produces False because the substring in the half-open slice 8:10 does not start with "day".

> or must it always produce false for any value of text because the
> match is never as long as the prefix string?

Correct, since the slice is only 2 characters long, and the prefix is 3 characters long, hence the slice can never begin with that prefix.
msg347195 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-07-03 09:19
Perhaps it would help if we spelled out the behaviour more explicitly?


str.startswith(prefix[, start=0[, end=len(string)]])

    Return True if the slice of string between start (defaults to the beginning of the string) and end (defaults to the end of the string) starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for.


(To be frank, I don't think we need to do this, I think the docs are fine as-is, but if others disagree perhaps this is an improvment.)
msg347212 - (view) Author: Glenn Linderman (v+python) * Date: 2019-07-03 10:31
Thanks for the explanations and suggestions. Now that I think I know what those parameters are used for...

Sorry, my first example was tweaked on the fly, and doesn't make as much sense as it could because it wound up being a mix of pre-tweaked and tweaked text, as Steven points out at the beginning of msg347194.

But the text he suggests in msg347195 would be an immense clarification to the existing text. The existing text is worded in such a way that it is not clear how the start and end parameters affect the search, except by analogy with other slicing operations in other parts of Python. Steven may be willing to draw such analogies to perceive that the current startswith documentation is clear, but if you go in with an open mind, uncluttered with the better-specified behavior of other Python operations, there are lots of possible interpretations. Describing the start/end parameters with defaults and explaining the whole operation as referring to the slice specified by those parameters makes it far less open to other interpretations.

The text Aldwin suggests in msg347188 (from re) is better than the original for startswith/endswith, but is not as clear as Steven's wording. I would actually suggest that Steven's wording could be the basis for an improvement for the re docs as quoted.

The second part, the "prefix can also be a tuple of prefixes to look for" could also be improved... neither prefix nor tuple of prefixes is defined as being a string.

Further, if the parameter syntax is shown with the defaults, then the parethetical comments about (defaults to...) are not really necessary, simplifying the description to:

The prefix parameter can be a single string, or a tuple of strings.
Return True if the slice of string specified by [start:end] starts with any complete string supplied as part of the prefix parameter, otherwise return False.
History
Date User Action Args
2020-09-19 18:03:39iritkatrielsetassignee: docs@python

components: + Documentation
nosy: + docs@python
2019-07-03 10:31:26v+pythonsetmessages: + msg347212
2019-07-03 09:19:29steven.dapranosetmessages: + msg347195
2019-07-03 09:13:47steven.dapranosetnosy: + steven.daprano
messages: + msg347194
2019-07-03 08:00:22aldwinaldwinsetmessages: + msg347189
2019-07-03 07:58:50aldwinaldwinsetmessages: + msg347188
2019-07-03 07:41:19holdenwebsetnosy: + holdenweb
messages: + msg347187
2019-07-03 07:30:26aldwinaldwinsetnosy: + aldwinaldwin
messages: + msg347185
2019-07-03 05:06:37v+pythonsetmessages: + msg347180
2019-07-03 05:01:35v+pythoncreate