Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResourceReader for FileLoader inconsistently handles path separators #80309

Closed
indygreg mannequin opened this issue Feb 26, 2019 · 7 comments
Closed

ResourceReader for FileLoader inconsistently handles path separators #80309

indygreg mannequin opened this issue Feb 26, 2019 · 7 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@indygreg
Copy link
Mannequin

indygreg mannequin commented Feb 26, 2019

BPO 36128
Nosy @warsaw, @brettcannon, @jaraco, @indygreg

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2021-05-27.01:56:33.735>
created_at = <Date 2019-02-26.21:05:00.528>
labels = ['3.7', '3.8', 'type-bug', 'library']
title = 'ResourceReader for FileLoader inconsistently handles path separators'
updated_at = <Date 2021-05-27.01:56:33.734>
user = 'https://github.com/indygreg'

bugs.python.org fields:

activity = <Date 2021-05-27.01:56:33.734>
actor = 'jaraco'
assignee = 'none'
closed = True
closed_date = <Date 2021-05-27.01:56:33.735>
closer = 'jaraco'
components = ['Library (Lib)']
creation = <Date 2019-02-26.21:05:00.528>
creator = 'indygreg'
dependencies = []
files = []
hgrepos = []
issue_num = 36128
keywords = []
message_count = 7.0
messages = ['336712', '336723', '356361', '356378', '356408', '356409', '394507']
nosy_count = 5.0
nosy_names = ['barry', 'brett.cannon', 'jaraco', 'Mekk', 'indygreg']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue36128'
versions = ['Python 3.7', 'Python 3.8']

@indygreg
Copy link
Mannequin Author

indygreg mannequin commented Feb 26, 2019

The implementation of the ResourceReader API for the FileLoader class in importlib/_bootstrap_external.py is inconsistent with regards to handling of path separators.

Specifically, "is_resource()" returns False if "resource" has a path separator. But "open_resource()" will happily open resources containing a path separator.

I would think the two would agree about whether a path with separators is a resource or not. The documentation at https://docs.python.org/3.7/library/importlib.html#importlib.abc.ResourceReader implies that resources in subdirectories should not be allowed.

One can easily demonstrate this behavior oddity with Mercurial:

(Pdb) p sys.modules['mercurial'].__spec__.loader.get_resource_reader('mercurial').open_resource('help/config.txt')
<io.FileIO name='/home/gps/src/hg/mercurial/help/config.txt' mode='rb' closefd=True>
(Pdb) p sys.modules['mercurial'].__spec
_.loader.get_resource_reader('mercurial').is_resource('help/config.txt')
False

The behavior has been present since the functionality was added (#5168).

@indygreg indygreg mannequin added 3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Feb 26, 2019
@warsaw
Copy link
Member

warsaw commented Feb 26, 2019

On Feb 26, 2019, at 13:05, Gregory Szorc <report@bugs.python.org> wrote:

I would think the two would agree about whether a path with separators is a resource or not. The documentation at https://docs.python.org/3.7/library/importlib.html#importlib.abc.ResourceReader implies that resources in subdirectories should not be allowed.

Historical context is in the standalone tracker:

https://gitlab.com/python-devs/importlib_resources/issues/58

Clearly, with the current definition of the API, open_resource() should not allow slashes. Please follow up on the above link if you have opinions about changing the behavior. We would have an opportunity to relax that constraint in Python 3.8, if it’s something we want to do.

@Mekk
Copy link
Mannequin

Mekk mannequin commented Nov 11, 2019

Hmm, I noticed this but accidentally and tried to port pypa/setuptools#1635 to new api. Well:

>>> import multiprocessing
>>> import sys
>>> reader = sys.modules['multiprocessing'].__spec__.loader.get_resource_reader('multiprocessing')
>>> reader.open_resource('../../../../etc/passwd')
<_io.FileIO name='/usr/lib/python3.7/multiprocessing/../../../../etc/passwd' mode='rb' closefd=True>

I suppose this is the case which deserve some thought (originally I faced it when some webapp used pkg_resources to provide static files and used resource api as a way to validate urls impacted by external input).

Tested on python 3.7.3, on Ubuntu 19.04.

@warsaw
Copy link
Member

warsaw commented Nov 11, 2019

One simple restriction would be to disallow relative paths outside of the resource anchor location.

@indygreg
Copy link
Mannequin Author

indygreg mannequin commented Nov 12, 2019

I think disallowing relative paths that are parents of the current anchor point is a reasonable restriction and acceptable backwards incompatible behavior.

Disallowing all relative paths with slashes is a larger issue. I would support that if designing/implementing things today as it simplifies things greatly. But since an implementation allowing slashes has shipped in 3.7 and 3.8, I'm not sure if the backwards incompatibility could be stomached, so I'm not going to advocate for it.

@indygreg
Copy link
Mannequin Author

indygreg mannequin commented Nov 12, 2019

I just noticed that there is a parallel discussion ongoing in https://gitlab.com/python-devs/importlib_resources/issues/58.

@jaraco
Copy link
Member

jaraco commented May 27, 2021

The preferred API as implemented in Python 3.9 and importlib_resources 1.1 is the files() API. This simpler API returns a Traversable object, a pathlib-like handle to the contents of a package. This approach side-steps the issues described above. In particular, is_resource no longer has a purpose. Path traversal is handled naturally through Traversable.join_path. Resources in subdirectories are now supported.

Parent objects ('..') are allowed, but only incidentally and allowed in the same way as they're allowed for any Python code. That is, one can call files('multiprocessing').joinpath('../../../../etc/passwd'), but that provides no advantage over pathlib.Path('/etc/passwd').

I believe this new API addresses the concerns presented.

Please open a new issue (here or in github.com/python/importlib_resources) if there are further concerns needing attention.

@jaraco jaraco closed this as completed May 27, 2021
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants