classification
Title: Directory traversal with http.server and SimpleHTTPServer on windows
Type: security Stage: resolved
Components: Library (Lib), Windows Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Thomas, martin.panter, paul.moore, phihag, python-dev, steve.dower, tim.golden, xiang.zhang, zach.ware
Priority: normal Keywords: patch

Created on 2016-03-28 15:30 by Thomas, last changed 2017-03-27 14:04 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
fuzz.py Thomas, 2016-03-28 15:30 Fuzzing test
fix-path-traversal-26657.patch phihag, 2016-03-28 19:59 Patch to fix the problem (includes testcase) review
fix-path-traversal-26657.patch phihag, 2016-03-28 20:51 Patch to fix the problem (includes testcase, updated) review
fix-path-traversal-26657.v3.patch martin.panter, 2016-04-02 03:59 review
Pull Requests
URL Status Linked Edit
PR 782 open haypo, 2017-03-23 12:28
PR 226 closed haypo, 2017-03-27 14:04
Messages (13)
msg262572 - (view) Author: Thomas (Thomas) * Date: 2016-03-28 15:30
SimpleHTTPServer and http.server allow directory traversal on Windows.
To exploit this vulnerability, replace all ".." in URLs with "c:c:c:..".


Example:
Run
python -m http.server
and visit
127.0.0.1:8000/c:c:c:../secret_file_that_should_be_secret_but_is_not.txt


There is a warning that those modules are not secure in the module docs,
but for some reason they do not appear in the online docs:
https://docs.python.org/3/library/http.server.html
https://docs.python.org/2/library/simplehttpserver.html


It would be nice if that warning was as apparent as for example here:
https://docs.python.org/2/library/xml.etree.elementtree.html


There are a lot of other URLs that are insecure as well, which can all
be traced back to here:
https://hg.python.org/cpython/file/tip/Lib/http/server.py#l766


The splitdrive and split functions, which should make sure that the
final output is free of ".." are only called once which leads to this
control flow:
---------------------------------------------------------------
path = "c:/secret/public"
word = "c:c:c:.."

_, word = os.path.splitdrive(word) # word = "c:c:.."
_, word = os.path.split(word) # word = "c:.."
path = os.path.join(path, word) # path = "c:/secret/public\\.."
---------------------------------------------------------------


Iterating splitdrive and split seems safer:
---------------------------------------------------------------
for word in words:
    # Call split and splitdrive multiple times until
    # word does not change anymore.
    has_changed = True
    while has_changed:
        previous_word = word
        _, word = os.path.split(word)
        _, word = os.path.splitdrive(word)
        has_changed = word != previous_word
---------------------------------------------------------------






There is another weird thing which I am not quite sure about here:
https://hg.python.org/cpython/file/tip/Lib/http/server.py#l761

---------------------------------------------------------------
path = posixpath.normpath(path)
words = path.split('/')
---------------------------------------------------------------

posixpath.normpath does not do anything with backslashes and then the
path is split by forward slashes, so it may still contain backslashes.
Maybe replacing posixpath.normpath with os.path.normpath and then
splitting by os.sep would work, but I don't have enough different
operating systems to test this, so someone else should have a look.





I have attached some simple fuzzing test that tries a few weird URLs and
sees if they lead where they shouldn't.
Disclaimer: Might still contain other bugs.
msg262581 - (view) Author: Philipp Hagemeister (phihag) * Date: 2016-03-28 19:59
Please find attached a patch which adds a testcase for Windows (on all platforms) as well as code to fix the problem. Since os.path.split returns everything after the final slash/backslash, it only needs to be called once.

Note that the usage of posixpath is correct and only relates to the URL parsing - it powers foo/bar/../../ .

The path elements may indeed contain backslashes - that's why we call os.path.split on them.
msg262583 - (view) Author: Philipp Hagemeister (phihag) * Date: 2016-03-28 20:51
Update testcase, and call split before splitdrive
msg262585 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-03-28 23:02
Thomas: can you point to the “warning that those modules are not secure in the module docs”? All I can see is a warning in the pydoc output for http.server.__doc__, but that is specifically about the CGI server.

The specific bug with allowing c:c:c:.. looks like it would have been triggered by fixing Issue 19456. If so, 2.7 would also be affected.

The whole translate_path() algorithm looks over-complicated to me. One quick idea that comes to mind is to skip (or diagnose) each whole URL path component if there is any drive, directory etc syntax present, rather than making an attempt to strip it off. Perhaps check with os.path.dirname() or pathlib’s is_reserved().

One other thing I have wondered about, but I don’t know Windows well enough to be sure. Shouldn’t there be some protection against accessing special files like <http://127.0.0.1:8000/con.fusion>?

Ideally, I would like to see a common function somewhere to do this kind of path validation and conversion. There are other places even in the standard library where this is needed, which I mentioned at <https://bugs.python.org/issue21109#msg216675>.
msg262595 - (view) Author: Thomas (Thomas) * Date: 2016-03-29 09:36
Martin Panter: Regarding the warning, you appear to be correct.
However, reading the source of http.server again made me notice
_url_collapse_path(path)
which seems to have some overlap with translate_path. Also it
crashes with an IndexError if path contains '..'.

Also, yes, python 2.7's SimpleHTTPServer is affected as well.

Discarding weird paths instead of trying to repair them would change semantics, but from a user perspective, it would be easier to understand what is going on, so I'd agree with that change.

Further, I agree that it would be nice if there was some library function to safely handle path operations.
The function you proposed in https://bugs.python.org/issue21109#msg216675 and https://bitbucket.org/vadmium/pyrescene/src/34264f6/rescene/utility.py#cl-217 leaves handling path separators to the user. Maybe that should be handled as well?
The function withstood my fuzzing tests on windows, so it might be correct.
There is probably a good reason for disallowing paths that contain /dev/null but I don't know why. Could you add a word or two of documentation to explain?

A really high-level solution would be to do away with all the strings and handle paths properly as the structure that they represent instead of trying to fake all kinds of things with strings, but that is probably beyond the scope of this issue.
msg262596 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-03-29 09:46
Url handling in http.server is not perfect and there have already been some issues talking about it, i.e, issue5714, issue14567.
msg262786 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-02 03:59
Thomas: My check for os.path.devnull was just a half-hearted attempt to check for special device names like NUL on Windows. It is far from foolproof, and would fail my CON.fusion test that I mentioned above. Anyway, to address this specific bug it would be better to keep the changes to a minimum and not add any new APIs.

One slight concern I have with Philipp’s patch is the new os_path parameter. I am a bit squeamish about adding parameters that are just to help testing. Perhaps it is enough to just rely on testing this on Windows, or to monkey-patch os.path = ntpath in the test suite? What do others think?

I am posting a modified version (v3) of Philipp’s patch. This version monkey-patches os.path in the tests and avoids the os_path parameter. It is also stricter, by ignoring any path component that does not appear to be a simple file or directory name.

This version will change how some questionable URLs are handled, but I expect that all of these URLs won’t have genuine use cases. Let me know if you think it is okay or not.
msg262793 - (view) Author: Thomas (Thomas) * Date: 2016-04-02 10:07
Looks ok to me security-wise. But I just noticed that it the trailing slash is inconsistent on Windows, e.g.:

translate_path('asdf/')
==
'C:\\Users\\User\\Desktop\\temp\\asdf/' <- this slash

because path += '/' is used instead of os.path.sep. But apparently nobody complained about this yet, so it probably is not an issue.
msg262798 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-04-02 15:51
Windows-only tests are fine, and certainly better than adding a new parameter just for testing.

Forward slashes are valid path segment separators on Windows 99% of the time, so that'll be why nobody has complained. Personally I prefer consistency, but not strongly enough to force a change here.
msg262806 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-03 00:36
Regarding the trailing slash: this is certainly inconsistent, but one call site of translate_path() appears to depend on this being a forward slash. There seems to be confusion about whether the output is an OS path or a URL. I think this is just more thing to look at in a hypothetical future overhaul of path and URL handling (e.g. see Zhang’s links above).
msg263533 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-16 00:01
I will try to commit my patch in a couple days if there are no objections.
msg263653 - (view) Author: Roundup Robot (python-dev) Date: 2016-04-18 08:02
New changeset 8054a68dfce2 by Martin Panter in branch '3.5':
Issue #26657: Fix Windows directory traversal vulnerability with http.server
https://hg.python.org/cpython/rev/8054a68dfce2

New changeset 5d8042ab3361 by Martin Panter in branch 'default':
Issue #26657: Merge http.server fix from 3.5
https://hg.python.org/cpython/rev/5d8042ab3361

New changeset 8aa032b26552 by Martin Panter in branch '2.7':
Issue #26657: Fix SimpleHTTPServer Windows directory traversal vulnerability
https://hg.python.org/cpython/rev/8aa032b26552
msg263661 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-18 09:41
Thanks for the report and the patch.
History
Date User Action Args
2017-03-27 14:04:13hayposetpull_requests: + pull_request749
2017-03-23 12:28:46hayposetpull_requests: + pull_request686
2016-04-18 09:41:33martin.pantersetstatus: open -> closed
resolution: fixed
messages: + msg263661

stage: commit review -> resolved
2016-04-18 08:02:49python-devsetnosy: + python-dev
messages: + msg263653
2016-04-16 00:01:09martin.pantersetmessages: + msg263533
stage: patch review -> commit review
2016-04-03 00:36:14martin.pantersetmessages: + msg262806
2016-04-02 15:51:21steve.dowersetmessages: + msg262798
2016-04-02 10:07:46Thomassetmessages: + msg262793
2016-04-02 03:59:55martin.pantersetfiles: + fix-path-traversal-26657.v3.patch

messages: + msg262786
stage: patch review
2016-03-29 09:46:34xiang.zhangsetmessages: + msg262596
2016-03-29 09:36:35Thomassetmessages: + msg262595
versions: + Python 2.7
2016-03-28 23:02:38martin.pantersetmessages: + msg262585
2016-03-28 20:51:30phihagsetfiles: + fix-path-traversal-26657.patch

messages: + msg262583
2016-03-28 19:59:51phihagsetfiles: + fix-path-traversal-26657.patch

nosy: + phihag
messages: + msg262581

keywords: + patch
2016-03-28 17:32:27xiang.zhangsetnosy: + xiang.zhang
2016-03-28 15:36:47SilentGhostsetnosy: + paul.moore, tim.golden, martin.panter, zach.ware, steve.dower

components: + Windows
versions: + Python 3.5
2016-03-28 15:30:15Thomascreate