Title: Docs for pkgutil.get_data inconsistent with semantics
Type: Stage:
Components: Library (Lib) Versions: Python 3.7, Python 3.6
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Antony.Lee, WGH, p-ganssle, twouters
Priority: normal Keywords:

Created on 2015-10-07 03:06 by Antony.Lee, last changed 2020-09-11 22:09 by brett.cannon.

Messages (6)
msg252450 - (view) Author: Antony Lee (Antony.Lee) * Date: 2015-10-07 03:06
The docs of pkgutil.get_data say "The resource argument should be in the form of a relative filename, using / as the path separator. The parent directory name .. is not allowed, and nor is a rooted name (starting with a /)."

In fact (on Python 3.5 at least):
* pkgutil.get_data("logging", "/") works, but simply chops off the first slash, returning the contents of the stdlib's logging/
* pkgutil.get_data("logging", "../") works, returning the contents of the stdlib's

People who actually thought about the implications of get_data/zipimport/etc. can decide whether to remove this functionality or to update the docs, I'm just reporting it.

Also, it would be nice if get_data gained a "text mode" (i.e. returning str instead of bytes and with support for universal newlines).
msg252480 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-10-07 17:38
Changing this to be a single issue about the difference between the docs and the semantics of pkgutil.

The feature request can be made into a separate issue, but there are plans to replace the API with a more stringently defined one in importlib.
msg263243 - (view) Author: WGH (WGH) Date: 2016-04-12 10:42
I think it can even be considered a security bug. A classic path traversal. The fact that documentation falsely suggests that there's no such vulnerability is clearly not helping.

Python 2.7 is affected as well, by the way.
msg263268 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-04-12 17:40
This can't change in Python 2.7 because of backwards-compatibility. And I would argue this isn't a serious security risk as pkgutil.get_data() typically works with string constants and values provided by the library and not user-provided values. This is basically the same as taking a value for open() and has the same risks.
msg310777 - (view) Author: Paul Ganssle (p-ganssle) * (Python committer) Date: 2018-01-26 17:15
I'm not sure if this warrants a separate issue, but I also notice this in the documentation:

> If the package cannot be located or loaded, or it uses a loader which does not support get_data, then None is returned. In particular, the loader for namespace packages does not support get_data.

But in reality this seems to raise a FileNotFoundError:

    >>> import pkgutil
    >>> data = pkgutil.get_data('dateutil.zoneinfo', 'dateutil-zoneinfo.tar.gz')
    >>> len(data)
    >>> data = pkgutil.get_data('dateutil.zoneinfo', 'foo-bar.tar.gz')
    FileNotFoundError: [Errno 2] No such file or directory: '/usr/lib/python3.6/site-packages/dateutil/zoneinfo/foo-bar.tar.gz'

Am I misunderstanding the documentation, or should the failure mode be corrected to specify that it raises an error?
msg310879 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-01-27 18:53
Notice that the returning of None only has to do with searching for the *package*, not the *data file*. So I think the docs are still correct according to your example, Paul.
Date User Action Args
2020-09-11 22:09:18brett.cannonsetnosy: + twouters
2020-09-11 22:09:00brett.cannonsetnosy: - brett.cannon
2018-01-27 18:53:45brett.cannonsetmessages: + msg310879
2018-01-26 17:15:18p-gansslesetnosy: + p-ganssle

messages: + msg310777
versions: + Python 3.7
2016-04-12 17:40:12brett.cannonsetmessages: + msg263268
2016-04-12 10:42:36WGHsetnosy: + WGH
messages: + msg263243
2015-10-07 17:38:36brett.cannonsetnosy: + brett.cannon

messages: + msg252480
title: Two issues with pkgutil.get_data -> Docs for pkgutil.get_data inconsistent with semantics
2015-10-07 03:06:14Antony.Leecreate