Issue 31984: startswith and endswith leak implementation details

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/76165

classification

Title:	startswith and endswith leak implementation details
Type:	behavior	Stage:
Components:		Versions:	Python 3.7

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Ronan.Lamy, barry, r.david.murray, serhiy.storchaka, steven.daprano
Priority:	normal	Keywords:

Created on 2017-11-08 16:59 by Ronan.Lamy, last changed 2022-04-11 14:58 by admin.

Messages (8)
msg305881 - (view)	Author: Ronan Lamy (Ronan.Lamy) *	Date: 2017-11-08 16:59
One would think that u.startswith(v, start, end) would be equivalent to u[start: end].startswith(v), but one would be wrong. And the same goes for endswith(). Here is the actual spec (for bytes, but str and bytearray are the same), in the form of passing pytest+hypothesis tests: from hypothesis import strategies as st, given def adjust_indices(u, start, end): if end < 0: end = max(end + len(u), 0) else: end = min(end, len(u)) if start < 0: start = max(start + len(u), 0) return start, end @given(st.binary(), st.binary(), st.integers(), st.integers()) def test_startswith_3(u, v, start, end): if v: expected = u[start:end].startswith(v) else: start0, end0 = adjust_indices(u, start, end) expected = start0 <= len(u) and start0 <= end0 assert u.startswith(v, start, end) is expected @given(st.binary(), st.binary(), st.integers(), st.integers()) def test_endswith_3(u, v, start, end): if v: expected = u[start:end].endswith(v) else: start0, end0 = adjust_indices(u, start, end) expected = start0 <= len(u) and start0 <= end0 assert u.endswith(v, start, end) is expected Fixing this behaviour to work in the "obvious" way would be simple: just add a check for len(v) == 0 and always return True in that case.
msg305882 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2017-11-08 17:28
Can you please give examples of what you think the problem is?
msg305886 - (view)	Author: Ronan Lamy (Ronan.Lamy) *	Date: 2017-11-08 17:57
The problem is the complexity of the actual behaviour of these methods. It is impossible to get it right without looking at the source (at least, it was for me), and I doubt any ordinary user can correctly make use of the v='' behaviour, or predict what the return value will be in all cases.
msg305887 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-11-08 18:05
See issue24284. `s1.startswith(s2, start, end)` for non-negative indices and non-tuple s2 is equivalent to expressions start + len(s2) <= end and s2[start: start + len(s2)] == s2 or s1.find(s2, start, end) == start
msg305891 - (view)	Author: Ronan Lamy (Ronan.Lamy) *	Date: 2017-11-08 19:17
Ah, thanks, I noticed the discrepancy between unicode and str in 2.7, but wondered when it was fixed. I guess I'm arguing that it was resolved in the wrong direction, then. Now, your first expression is wrong, even after fixing the obvious typo. The correct version is: start + len(s2) <= min(len(s1), end) and s1[start: start + len(s2)] == s2 If the person who implemented the behaviour can get it right, who will? ;-) The second expression is correct, but I'll argue that it shows that find() also suffers from a discrepancy between its basic one-argument form and the extended ones.
msg305901 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-11-08 20:37
For the justification of the find() behavior see msg243668. But the largest argument for this behavior is that find() have it for a long time. Changing it will break existing code that depends on it. This argument is weaker in the case of startwith() and endwith() because their behavior for bytes and Unicode was inconsistent. But the consistency with find() plays a role.
msg305922 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2017-11-08 23:55
Thank you for the bug report Ronan, but I'm afraid that I have no idea what you think the problematic behaviour is. I'm not going to spend the time installing the third-party hypothesis module, and learning how to use it, just to decipher your "actual spec". Where did this spec come from? The documentation is fairly sparse: https://docs.python.org/3/library/stdtypes.html#str.startswith so I'm not sure where your spec comes from. The title of this ticket is uninformative: what implementation details are being leaked? Saying "The problem is the complexity of the actual behaviour of these methods." explains nothing. Which actual behaviour? Please provide simple examples that contrast expected behaviour from actual behaviour, and justification for the expected behaviour.
msg305923 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2017-11-09 00:03
I don't have Python 3.7 available to me, but in 3.5 the behaviour of u.startswith(v) with an empty v seems consistent to me: py> "alpha".startswith("", 20, 30) True py> "alpha"[20:30].startswith("") True py> "".startswith("", 20, 30) True py> ""[20:30].startswith("") True So I can't see any inconsistency that might be fixed by always returning True in the case v="", as that appears to already be the case.

History
Date	User	Action	Args
2022-04-11 14:58:54	admin	set	github: 76165
2017-11-09 00:03:12	steven.daprano	set	messages: + msg305923
2017-11-08 23:55:48	steven.daprano	set	nosy: + steven.daprano messages: + msg305922
2017-11-08 20:37:48	serhiy.storchaka	set	messages: + msg305901
2017-11-08 19:17:49	Ronan.Lamy	set	messages: + msg305891
2017-11-08 18:45:36	barry	set	nosy: + barry
2017-11-08 18:05:54	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg305887
2017-11-08 17:57:39	Ronan.Lamy	set	messages: + msg305886
2017-11-08 17:28:14	r.david.murray	set	nosy: + r.david.murray messages: + msg305882
2017-11-08 16:59:43	Ronan.Lamy	create