classification
Title: [security] Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem
Type: security Stage: patch review
Components: Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: hroncok, kj, lemburg, lukasz.langa, mdk, ned.deily, serhiy.storchaka, vstinner
Priority: critical Keywords: patch

Created on 2021-01-21 12:18 by hroncok, last changed 2021-02-03 17:51 by ned.deily.

Pull Requests
URL Status Linked Edit
PR 24285 open kj, 2021-01-21 16:59
PR 24337 open serhiy.storchaka, 2021-01-26 13:01
Messages (19)
msg385412 - (view) Author: Miro Hrončok (hroncok) * Date: 2021-01-21 12:18
Hello Python security,
a Fedora user has reported the following security vulnerability to us (I was able to verify it):

Running `pydoc -p` allows other local users to extract arbitrary files.

Steps to Reproduce:
1. start pydoc on a port
2. as a different user guess or extract the port
3. call getfile on the server to extract arbitrary files, e.g. http://localhost:8888/getfile?key=/home/dave/.ssh/id_rsa

Actual results:
any local user on the multi-user system can read all my keys and secrets

Expected results:
Access is prevented.

Additional info:
At least a warning should be printed, that this is insecure on multi-user systems.

Python notebook works around this by providing a token that is required to access the notepad. Depending on the system being able to read arbitrary files can allow to impersonate my, by  e.g. stealing my ssh-key (if it is non-encrypted) 



I've originally reported this to security@python.org but I was asked to open a public issue here.
msg385413 - (view) Author: Julien Palard (mdk) * (Python committer) Date: 2021-01-21 12:58
Nice find! I am able to reproduce it too in many Python releases.

I see differnt ways we can fix it:


# Using a random secret generated at startup time

Used any way, like by providing an hmac on getfile urls to ensure they are signed with the server secret.

Con: getfile URLS won't work from a run to another run (the secret should be random and changed at every start), and can't be shared (do someone share them in the first place?)


# Allowlist according to sys.path

In getfile implementation, we can check if the asked path is under a path from sys.path.

Con: If someone have ~/ in sys.path, we still can access all its home, or if someone start it using `python -m pydoc` while being in its home directory as Python will prepend the cwd in sys.path.


# Allowlist populated while generating links

Idea is: each time the server generates a getfile link, the target is added to an allowlist.

Each time a getfile link is requested, if the file is not in the allowlist, request is denied.

Con: Refreshing a page won't work after a server restart (thus having an empty allowlist).


# fnmatch allowlist

We could allow only `.py` files.

Con: Secrets stored in `.py` files under user project could still be leaked.


-----------------

My personal preference goes for the allowlist populated while generating links.
msg385415 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-21 13:13
Downstream Fedora issue: https://bugzilla.redhat.com/show_bug.cgi?id=1917807
msg385418 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-01-21 13:44
Looking at the _url_handler() code in pydoc.py, this was clearly not written with web server standards in mind. None of the handlers apply security checks on the user input and there are most likely several other vulnerabilities in there to be found.

It's not just the getfile query allowing reading arbitrary files. The user may well have code in his or her Python installation which is not meant to be published to other users on the same server.

I'd suggest to print a big warning on the console, explaining that the web server will potentially make all content accessible by the user visible to anyone else on the same server.

Perhaps adding some extra check to the html_getfile() handler would be good as well, making sure that the path is on sys.path and maps to a Python file (there could be non-Python file resources in package dirs as well).

Alternatively, perhaps the whole getfile logic could be removed and the web server just provide the path to the source file (as file:// link), so that the user can easily open it, but needs access permissions for this to be successful.
msg385420 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-21 13:53
I searched for "pydoc by Ka-Ping Yee" in Google and only found two online pydoc services:

* https://gae-pydoc.appspot.com/
* http://www.cc.kyoto-su.ac.jp/~atsushi/Programs/VisualWorks/CSV2HTML/CSV2HTML_PyDoc/index_of_modules.html

The first one runs on Python 2.7 which doesn't have the getfile feature (added in Python 3.6), the second one is a static website.

=> there is no public vulnerable website: good!

I don't think that pydoc is commonly used to run a server on the Internet.
msg385421 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-21 13:55
An option is also to remove the whole getfile feature. It was added in bpo-2001 by:

commit 7bb30b72d8a165f8bacbc480b8d5a15834fa4c35
Author: Nick Coghlan <ncoghlan@gmail.com>
Date:   Fri Dec 3 09:29:11 2010 +0000

    Improve Pydoc interactive browsing (#2001).  Patch by Ron Adam.
    
    * A -b option to start an enhanced browsing session.
    * Allow -b and -p options to be used together.
    * Specifying port 0 will pick an arbitrary unused socket port.
    * A new browse() function to start the new server and browser.
    * Show Python version information in the header.
    * A *Get* field which takes the same input as the help() function.
    * A *Search* field which replaces the Tkinter search box.
    * Links to *Module Index*, *Topics*, and *Keywords*.
    * Improved source file viewing.
    * An HTMLDoc.filelink() method.
    * The -g option and the gui() and serve() functions are deprecated.
msg385422 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-21 13:57
The getfile feature is used to display the source code of a Python module.

For example, on the difflib documentation, there a link to difflib.py. If you click, a webpage displays the content of the file.

I suggest to remove the whole feature. I don't think that it's so useful, compared to the vulnerability.
msg385435 - (view) Author: Ken Jin (kj) * Date: 2021-01-21 17:32
I created a PR to remove the getfile function - now it just places the hyperlinked file path there but clicking on it won't render the file contents.

Personally I agree with Marc-Andre Lemburg's comments on how _url_handler probably has other vulnerabilities somewhere. But I don't really see an easy solution other than removing the web server altogether. It uses http.server, which has a disclaimer on the docs page saying it isn't recommended for production. Someone looking hard enough can probably find a few more vulnerabilities in http.server itself rather than just pydoc.

I think the "Allowlist populated while generating links" suggested by Julien is pretty clever. 

I thought about file: // approach too - it's probably the most secure. But it would require a lot of change (and also generating all the .py files to .html initially).

Maybe I'll make a PR exploring the other approaches if the current one isn't favorable.

Thanks for your time.
msg385460 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-22 00:28
> I'd suggest to print a big warning on the console, explaining that the web server will potentially make all content accessible by the user visible to anyone else on the same server.

I dislike this idea. If they are vulnerabilities, they should be fixed. Users usually have no idea what to do when seeing such warning.
msg385485 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-01-22 09:05
On 22.01.2021 01:28, STINNER Victor wrote:
> 
> STINNER Victor <vstinner@python.org> added the comment:
> 
>> I'd suggest to print a big warning on the console, explaining that the web server will potentially make all content accessible by the user visible to anyone else on the same server.
> 
> I dislike this idea. If they are vulnerabilities, they should be fixed. Users usually have no idea what to do when seeing such warning.

The problem is that neither the docs nor the help text in the command
make it clear what exactly is exposed via the web server pydoc
launches.

While the getfile API endpoint can be used to view non-Python files
as well (which is certainly not intended), the tool also makes available
all Python modules which can be found on sys.path of the user starting
pydoc -p. It shows all doc-strings, functions, the class structure and
literal values of any constants found in those modules.

In a corporate environment this can easily result in data leaks
of e.g. unreleased software, personal information, disclosure of
NDA protected code, designs, algorithms and other secrets.

Fixing just getfile or replacing those links with file:// ones will
only address one part of the problem. The other is educating the
user about possible consequences of running a server on the machine
-- just like you warn users about deleting files before going ahead
with it.

Python's http.server at least warns about this in the docs:
https://docs.python.org/3/library/http.server.html
and limits the serving to the current dir (and subdirs).

My guess is that pydoc -p really is just intended to be useful
for the current user. Rather than having it serve files under
a blanket URL, it could restrict browsing to a random URL
token generated at pydoc startup and open this in the browser
via the "b" command or the -b option, e.g.

"""
Server ready at http://localhost:8080/uLy6t87AD-ScPthd/
Server commands: [b]rowser, [q]uit
server>
"""

That would make it harder to guess the base URL and limit
exposure.
msg385488 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-01-22 10:34
Why not limit the serving to sys.path?
msg385489 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-22 10:58
Fidget-Spinner wrote on the PR:
> AFAIK no. However, pydoc currently works by calling inspect on files it sees in path, and this may reveal private code as Marc-Andre Lemburg pointed out on the bpo. I will try the random url token he suggested via secrets.token_urlsafe to see if it helps.

pydoc shows global constant values in the doc. So yes, if you find a settings.py of a Django project, you can discover secrets.

I'm working on bpo-42955 "Add sys.module_names: list of stdlib module names (Python and extension modules)".

One option would be to restrict pydoc to stdlib modules by defaults, and ask to opt-in for discovery of any module installed on the system (sys.path).
msg385492 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-01-22 11:39
> Python's http.server at least warns about this in the docs:
> https://docs.python.org/3/library/http.server.html
> and limits the serving to the current dir (and subdirs).

I would be fine with a warning in the pydoc documentation, but I dislike warnings display on the command line. When I see such warning, I feel that the machine considers that I'm dumb and I have no idea of what I am doing.

If it's unsafe, can we make it safe by default?
msg385503 - (view) Author: Ken Jin (kj) * Date: 2021-01-22 16:05
I have updated the PR to do the following:

- removed html_getfile
- implement a unique secret as suggested above

Now it says:
>>> python.exe -m pydoc -b
Server ready at http://localhost:52035/Y1YzOyEbitE9BB_dtH0YXbMgGXbcg3ytXLpvpg8P7GEM3z1hlCkTXgxaojtAordVqs2s6oHZHPMbXqq9mXq_wbJCVW8jnHrgQeYE5hFUQuI/


FWIW, it seems that Jupyter notebook server deals with the same problems in a similar manner: https://jupyter-notebook.readthedocs.io/en/stable/security.html#security-in-the-jupyter-notebook-server

I removed the warning message in the PR because I think this is secure enough.
msg385710 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-01-26 13:04
PR 24337 uses different approach. It keeps compatibility, but checks that the argument is a file path of the source of one of modules (using the same algorithm as /search).
msg385721 - (view) Author: Ken Jin (kj) * Date: 2021-01-26 15:52
@Serhiy,

While this approach solves the getfile problem, I don't think this will solve the other problem of pydoc leaking secrets stored in python files:

Quoting from Marc-Andre Lemburg's message:
> the tool also makes available all Python modules which can be found on sys.path of the user starting pydoc -p. It shows all doc-strings, functions, the class structure and literal values of any constants found in those modules.
> In a corporate environment this can easily result in data leaks of e.g. unreleased software, personal information, disclosure of NDA protected code, designs, algorithms and other secrets.

Quoting from Victor's messages:
> pydoc shows global constant values in the doc. So yes, if you find a settings.py of a Django project, you can discover secrets.

Ultimately, the problem seems to be that .py files (other than those in the stdlib) may contain sensitive info, which pydoc can read.
msg385866 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2021-01-28 14:46
Resolution of this issue is blocking 3.7.x and 3.6.x security releases and threatens to block upcoming maintenance releases.
msg386221 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-02-03 17:35
While this vulnerability is bad, it only impacts very few users who run pydoc server. I suggest to not hold the incoming Python release (remove the "release blocker" priority) just for this one. If it's fixed before: great! But IMO it can wait for another Python release.
msg386222 - (view) Author: Miro Hrončok (hroncok) * Date: 2021-02-03 17:37
I agree.
History
Date User Action Args
2021-02-03 17:51:39ned.deilysetpriority: release blocker -> critical
2021-02-03 17:37:50hroncoksetmessages: + msg386222
2021-02-03 17:35:59vstinnersetmessages: + msg386221
2021-01-28 14:46:42ned.deilysetpriority: normal -> release blocker
title: Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem -> [security] Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem
nosy: + lukasz.langa, ned.deily

messages: + msg385866
2021-01-26 15:52:43kjsetmessages: + msg385721
2021-01-26 14:10:10vstinnersetpull_requests: - pull_request23159
2021-01-26 13:51:19vstinnersetpull_requests: + pull_request23159
2021-01-26 13:50:54vstinnersetpull_requests: - pull_request23157
2021-01-26 13:49:35vstinnersetpull_requests: + pull_request23157
2021-01-26 13:04:03serhiy.storchakasetmessages: + msg385710
2021-01-26 13:01:32serhiy.storchakasetpull_requests: + pull_request23156
2021-01-22 16:05:20kjsetmessages: + msg385503
2021-01-22 11:39:14vstinnersetmessages: + msg385492
2021-01-22 10:58:14vstinnersetmessages: + msg385489
2021-01-22 10:34:27serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg385488
2021-01-22 09:05:05lemburgsetmessages: + msg385485
2021-01-22 00:28:34vstinnersetmessages: + msg385460
2021-01-21 17:32:31kjsetmessages: + msg385435
2021-01-21 16:59:38kjsetkeywords: + patch
nosy: + kj

pull_requests: + pull_request23104
stage: patch review
2021-01-21 13:57:51vstinnersetmessages: + msg385422
2021-01-21 13:55:20vstinnersetmessages: + msg385421
2021-01-21 13:53:44vstinnersetmessages: + msg385420
title: Information disclosure via pydoc -p -> Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem
2021-01-21 13:44:38lemburgsetnosy: + lemburg
messages: + msg385418
2021-01-21 13:13:57vstinnersetmessages: + msg385415
2021-01-21 13:11:23vstinnersetnosy: + vstinner
2021-01-21 12:58:23mdksetnosy: + mdk
messages: + msg385413
2021-01-21 12:18:37hroncokcreate