Title: provide the authorative source for s[i:j] negative slice indices (<-len(s)) behavior for standard sequences
Type: Stage:
Components: Documentation Versions: Python 3.7, Python 3.6, Python 3.5
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: akira, docs@python, josh.r
Priority: normal Keywords: patch

Created on 2017-01-23 16:49 by akira, last changed 2017-03-17 20:35 by akira.

File name Uploaded Description Edit
docs-negative-slice-indices.patch akira, 2017-01-23 16:49 review
Pull Requests
URL Status Linked Edit
PR 702 open akira, 2017-03-17 20:35
Messages (3)
msg286098 - (view) Author: Akira Li (akira) * Date: 2017-01-23 16:49
I've failed to find where the behavior for negative indices in s[i:j]
expression (i, j < -len(s)) for standard sequences (str, list, etc) is
formally defined.

The observed behavior implemented in PySlice_GetIndicesEx(): If "len(s)
+ i" or "len(s) + j" is negative, use 0. [1] I don't see it in the docs.

        if (*start < 0) *start += length;
        if (*start < 0) *start = (*step < 0) ? -1 : 0;
        if (*stop < 0) *stop += length;
        if (*stop < 0) *stop = (*step < 0) ? -1 : 0;

The tutorial mentions [2]:

> out of range slice indexes are handled gracefully when used for
> slicing"

slice.indices() documentation says [3]:

> Missing or out-of-bounds indices are handled in a manner consistent
> with regular slices.

Neither define it explicitly.

The behavior for the upper boundary is defined explicitly [4]:

> If *i* or *j* is greater than ``len(s)``, use ``len(s)``

I've added the documentation patch that defines the behavior for the
lower boundary too.

[1] Objects/sliceobject.c
[2] Doc/tutorial/introduction.rst
[3] Doc/reference/datamodel.rst
[4] Doc/library/stdtypes.rst
msg286206 - (view) Author: Josh Rosenberg (josh.r) * Date: 2017-01-24 19:08
I think the wording could be improved, but there is another option I wanted to put here. Right now, we're being overly detailed about the implementation, specifying the bounds substitutions performed. If we're just trying to describe logical behavior, we could simplify footnote 4 to, dropping explicit descriptions for "out of bounds" cases, getting:

The slice of *s* from *i* to *j* is defined as the sequence of items with index *k* such that ``i <= k < j`` and ``0 <= k < len(s)``. If *i* is omitted or ``None``, use ``0``. If *j* is omitted or ``None``, use ``len(s)``. If *i* is greater than or equal to *j*, the slice is empty.

That avoids needing to be explicit about substitutions in the < -len(s) case and > len(s) cases, since limiting the values of k to the intersection of range(i, j) and range(len(s)) covers both ends. I considered a single range, like ``max(0, i) <= k < min(j, len(s))``, but that wouldn't describe the upper bound on i or the lower bound on j properly, and ``min(max(0, i), len(s)) <= k < min(max(0, j), len(s))`` is ugly. Footnote 3 covers the adjustment for negative values already, which allows for the simpler description.
msg289780 - (view) Author: Akira Li (akira) * Date: 2017-03-17 20:05
I prefer the wording in the current patch. Though I don't have strong feelings one way or the other as long as the behavior is specified explicitly.
Date User Action Args
2017-03-17 20:35:38akirasetpull_requests: + pull_request576
2017-03-17 20:05:00akirasetmessages: + msg289780
2017-01-24 19:08:30josh.rsetnosy: + josh.r
messages: + msg286206
2017-01-23 16:49:16akiracreate