This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: ResourceReader for FileLoader inconsistently handles path separators
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Mekk, barry, brett.cannon, indygreg, jaraco
Priority: normal Keywords:

Created on 2019-02-26 21:05 by indygreg, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (7)
msg336712 - (view) Author: Gregory Szorc (indygreg) * Date: 2019-02-26 21:05
The implementation of the ResourceReader API for the FileLoader class in importlib/_bootstrap_external.py is inconsistent with regards to handling of path separators.

Specifically, "is_resource()" returns False if "resource" has a path separator. But "open_resource()" will happily open resources containing a path separator.

I would think the two would agree about whether a path with separators is a resource or not. The documentation at https://docs.python.org/3.7/library/importlib.html#importlib.abc.ResourceReader implies that resources in subdirectories should not be allowed.

One can easily demonstrate this behavior oddity with Mercurial:

(Pdb) p sys.modules['mercurial'].__spec__.loader.get_resource_reader('mercurial').open_resource('help/config.txt')
<_io.FileIO name='/home/gps/src/hg/mercurial/help/config.txt' mode='rb' closefd=True>
(Pdb) p sys.modules['mercurial'].__spec__.loader.get_resource_reader('mercurial').is_resource('help/config.txt')
False

The behavior has been present since the functionality was added (https://github.com/python/cpython/pull/5168).
msg336723 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-02-26 23:45
On Feb 26, 2019, at 13:05, Gregory Szorc <report@bugs.python.org> wrote:
> 
> I would think the two would agree about whether a path with separators is a resource or not. The documentation at https://docs.python.org/3.7/library/importlib.html#importlib.abc.ResourceReader implies that resources in subdirectories should not be allowed.

Historical context is in the standalone tracker:

https://gitlab.com/python-devs/importlib_resources/issues/58

Clearly, with the current definition of the API, open_resource() should not allow slashes.  Please follow up on the above link if you have opinions about changing the behavior.  We would have an opportunity to relax that constraint in Python 3.8, if it’s something we want to do.
msg356361 - (view) Author: Marcin Kasperski (Mekk) Date: 2019-11-11 14:13
Hmm, I noticed this but accidentally and tried to port https://github.com/pypa/setuptools/issues/1635 to new api. Well:

>>> import multiprocessing
>>> import sys
>>> reader = sys.modules['multiprocessing'].__spec__.loader.get_resource_reader('multiprocessing')
>>> reader.open_resource('../../../../etc/passwd')
<_io.FileIO name='/usr/lib/python3.7/multiprocessing/../../../../etc/passwd' mode='rb' closefd=True>

I suppose this is the case which deserve some thought (originally I faced it when some webapp used pkg_resources to provide static files and used resource api as a way to validate urls impacted by external input).

Tested on python 3.7.3, on Ubuntu 19.04.
msg356378 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-11-11 18:24
One simple restriction would be to disallow relative paths outside of the resource anchor location.
msg356408 - (view) Author: Gregory Szorc (indygreg) * Date: 2019-11-12 03:12
I think disallowing relative paths that are parents of the current anchor point is a reasonable restriction and acceptable backwards incompatible behavior.

Disallowing all relative paths with slashes is a larger issue. I would support that if designing/implementing things today as it simplifies things greatly. But since an implementation allowing slashes has shipped in 3.7 and 3.8, I'm not sure if the backwards incompatibility could be stomached, so I'm not going to advocate for it.
msg356409 - (view) Author: Gregory Szorc (indygreg) * Date: 2019-11-12 03:18
I just noticed that there is a parallel discussion ongoing in https://gitlab.com/python-devs/importlib_resources/issues/58.
msg394507 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021-05-27 01:56
The preferred API as implemented in Python 3.9 and importlib_resources 1.1 is the `files()` API. This simpler API returns a Traversable object, a pathlib-like handle to the contents of a package. This approach side-steps the issues described above. In particular, `is_resource` no longer has a purpose. Path traversal is handled naturally through `Traversable.join_path`. Resources in subdirectories are now supported.

Parent objects ('..') are allowed, but only incidentally and allowed in the same way as they're allowed for any Python code. That is, one can call `files('multiprocessing').joinpath('../../../../etc/passwd')`, but that provides no advantage over `pathlib.Path('/etc/passwd')`.

I believe this new API addresses the concerns presented.

Please open a new issue (here or in github.com/python/importlib_resources) if there are further concerns needing attention.
History
Date User Action Args
2022-04-11 14:59:11adminsetgithub: 80309
2021-05-27 01:56:33jaracosetstatus: open -> closed

nosy: + jaraco
messages: + msg394507

resolution: fixed
stage: resolved
2019-11-12 03:18:08indygregsetmessages: + msg356409
2019-11-12 03:12:45indygregsetmessages: + msg356408
2019-11-11 18:24:24barrysetmessages: + msg356378
2019-11-11 14:13:26Mekksetnosy: + Mekk
messages: + msg356361
2019-02-27 22:58:33brett.cannonsetnosy: + brett.cannon
2019-02-26 23:45:24barrysetmessages: + msg336723
2019-02-26 21:05:00indygregcreate