This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author josh.r
Recipients josh.r
Date 2021-10-08.18:11:18
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1633716678.9.0.917396229112.issue45414@roundup.psfhosted.org>
In-reply-to
Content
At least on PosixPath (not currently able to try on Windows to check WindowsPath, but from a quick code check I think it'll behave the same way), the negative indexing added in #21041 is implemented incorrectly for absolute paths. Passing either -1 or -2 will return a path representing the root, '/' for PosixPath (which should only be returned for -1), and passing an index of -3 or beyond returns the value expected for that index + 1, e.g. -3 gets the result expected for -2, -4 gets the result for -3, etc. And for the negative index that should be equivalent to index 0, you end up with an IndexError.

The underlying problem appears to be that absolute paths (at least, those created from a string) are represented in self._parts with the root '/' included (redundantly, since self._root has it too), so all the actual components of the path are offset by one.

This does not affect slicing (slicing is implemented using range and slice.indices to perform normalization from negative to positive indices, so it never indexes with a negative index).

Example:

>>> from pathlib import Path
>>> p = Path('/1/2/3')
>>> p._parts
['/', '1', '2', '3']
>>> p.parents[:]
(PosixPath('/1/2'), PosixPath('/1'), PosixPath('/'))
>>> p.parents[-1]
PosixPath('/')
>>> p.parents[-1]._parts  # Still behaves normally as self._root is still '/'
[]
>>> p.parents[-2]
PosixPath('/')
>>> p.parents[-2]._parts
['/']
>>> p.parents[-3]
PosixPath('/1')
>>> p.parents[-4]
Traceback (most recent call last):
    ...
IndexError: -4

It looks like the underlying problem is that the negative indexing code doesn't account for the possibility of '/' being in _parts and behaving as a component separate from the directory/files in the path. Frankly, it's a little odd that _parts includes '/' at all (Path has a ._root/.root attribute that stores it too, and even when '/' isn't in the ._parts/.parts, the generated complete path includes it because of ._root), but it looks like the docs guaranteed that behavior in their examples.

It looks like one of two options must be chosen:

1. Fix the negative indexing code to account for absolute paths, and ensure absolute paths store '/' in ._parts consistently (it should not be possible to get two identical Paths, one of which includes '/' in _parts, one of which does not, which is possible with the current negative indexing bug; not sure if there are any documented code paths that might produce this warped sort of object outside of the buggy .parents), or

2. Make no changes to the negative indexing code, but make absolute paths *never* store the root as the first element of _parts (.parts can prepend self._drive/self._root on demand to match documentation). This probably involves more changes (lots of places assume _parts includes the root, e.g. the _PathParents class's own __len__ method raises a ValueError when called on the warped object returned by p.parents[-1], because it adjusts for the root, and the lack of one means it returns a length of -1).

I think #1 is probably the way to go. I believe all that would require is to add:

if idx < 0:
    return self.__getitem__(len(self) + idx)

just before:

return self._pathcls._from_parsed_parts(self._drv, self._root, self._parts[:-idx - 1])

so it never tries to use a negative idx directly (it has to occur after the check for valid index in [-len(self), len(self) so very negative indices don't recurse until they become positive).

This takes advantage of _PathParents's already adjusting the reported length for the presence of drive/root, keeping the code simple; the alternative I came up with that doesn't recurse changes the original return line:

return self._pathcls._from_parsed_parts(self._drv, self._root, self._parts[:-idx - 1])

to:

adjust = idx >= 0 or not (self._drv or self._root)
return self._pathcls._from_parsed_parts(self._drv, self._root, self._parts[:-idx - adjust])

which is frankly terrible, even if it's a little faster.
History
Date User Action Args
2021-10-08 18:11:18josh.rsetrecipients: + josh.r
2021-10-08 18:11:18josh.rsetmessageid: <1633716678.9.0.917396229112.issue45414@roundup.psfhosted.org>
2021-10-08 18:11:18josh.rlinkissue45414 messages
2021-10-08 18:11:18josh.rcreate