classification
Title: http.server.is_cgi fails to handle CGI URLs containing PATH_INFO
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Giovanni.Funchal, facundobatista, fdrake, orsenthil, python-dev, v+python
Priority: normal Keywords:

Created on 2010-11-21 07:31 by v+python, last changed 2012-04-11 18:39 by orsenthil. This issue is now closed.

Files
File name Uploaded Description Edit
file1.py orsenthil, 2012-04-10 19:35
time_test.py v+python, 2012-04-11 09:30
Messages (22)
msg121876 - (view) Author: Glenn Linderman (v+python) * Date: 2010-11-21 07:31
is_cgi doesn't properly handle PATH_INFO parts of the path.  The Python2.x CGIHTTPServer.py had this right, but the introduction and use of _url_collapse_path_split broke it.

_url_collapse_path_split splits the URL into a two parts, the second part is guaranteed to be a single path component, and the first part is the rest.  However, URLs such as

/cgi-bin/foo.exe/this/is/PATH_INFO/parameters

can and do want to exist, but the code in is_cgi will never properly detect that /cgi-bin/foo.exe is the appropriate executable, and the rest should be PATH_INFO.

This used to work correctly in the precedecessor CGIHTTPServer.py code in Python 2.6, so is a regression.
msg122259 - (view) Author: Glenn Linderman (v+python) * Date: 2010-11-24 03:50
Here is a replacement for the body of is_cgi that will work with the current _url_collapse_path_split function, but it seems to me that it is ineffecient to do multiple splits and joins of the path between the two functions.

            splitpath = server._url_collapse_path_split(self.path)
            # more processing required due to possible PATHINFO parts
            # not clear above function really does what is needed here,
            # nor just how general it is!
            splitpath = '/'.join( splitpath ).split('/', 2 )
            head = '/' + splitpath[ 1 ]
            tail = splitpath[ 2 ]
            if head in self.cgi_directories:
                self.cgi_info = head, tail
                return True
            return False
msg154718 - (view) Author: Giovanni Funchal (Giovanni.Funchal) Date: 2012-03-01 23:51
This is still an issue as of python 3.2.2 and is affecting me.
msg154719 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-02 00:05
Oh Sorry. I shall fix this by this weekend.
msg154720 - (view) Author: Glenn Linderman (v+python) * Date: 2012-03-02 01:00
issue 13893 contains code that fixes this, and several other open issues. Sadly, I created that code by "debugging and rewriting until it worked", and only then teased apart the specific, separable issues that I had debugged and fixed, and created issues.

I'm not up to speed on the Python development process, but feel free to borrow any or all of my code in issue 13893, which contains the fixes for all the following issues:

issue 10483
issue 10484 (this issue)
issue 10485
issue 10486 (creates more standard Environment variables, not sure all)
issue 10487

I would certainly like to see all of these issues fixed, rather than continue to maintain my own code in parallel to various Python releases.  Whether they are fixed with my code, or with other code, is immaterial to me, but my code has been running for over a year in my environments without discovering additional bugs.
msg155996 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-16 06:23
To note in the tracker, the original 2.6 code was changed in Issue 2254 and the relevant changeset is c6c4398293bd

I find, Glenn's suggestion a possibly okay solution for PATH_INFO problem, keeping in tact the security issue the the previous change dealt with.
msg156001 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-16 07:01
Actually no.  I revert back my previous statement. Meddling with splitpath breaks all the tests.
msg156007 - (view) Author: Glenn Linderman (v+python) * Date: 2012-03-16 07:51
I'm not sure what tests break or why, I suppose it is the ones for the security patch.  I'm not sure which code you are referring to... the code in msg122259 I tested only cursorily, the (different) code in issue 13893 is what I'm running today, and so has been tested more heavily, but I haven't run the regression tests (haven't figured out how).

The latter code fixes the _url_collapse_path_split function, instead of working around it, which the code in msg122259 tries to do.

I would be highly interested if it fails any tests, and why, and would be glad to help analyze and fix.
msg156010 - (view) Author: Roundup Robot (python-dev) Date: 2012-03-16 08:15
New changeset bab9f29c93fd by Senthil Kumaran in branch '2.7':
2.7 - Issue #10484: Fix the CGIHTTPServer's PATH_INFO handling problem
http://hg.python.org/cpython/rev/bab9f29c93fd

New changeset 88c86869ce92 by Senthil Kumaran in branch '3.2':
closes issue10484 - Fix the http.server's cgi PATH_INFO handling problem
http://hg.python.org/cpython/rev/88c86869ce92

New changeset 13c44ad094b4 by Senthil Kumaran in branch 'default':
closes issue10484 - Fix the http.server's cgi PATH_INFO handling problem
http://hg.python.org/cpython/rev/13c44ad094b4
msg156011 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-16 08:18
I treated this as a regression from 2.6. A cgi url with /cgi-bin/hello.py/hello/world would have PATH_INFO as '/hello/world' 
I saw some apache specs saying that it should be 'hello/world', but I relied on python2.6's behavior with had initial '/'.

Glenn, I referred to Lib/test/test_httpservers.py.

Thanks for raising this and other issues (and patches)!
msg156012 - (view) Author: Glenn Linderman (v+python) * Date: 2012-03-16 08:25
In reviewing my code in this area, I also see that in addition to fixing _url_collapse_path_split, I override the location that uses it, which is the is_cgi function.  Here is my code for the override, which actually creates a proper PATH_INFO string:

        def is_cgi(self):
            """Test whether self.path corresponds to a CGI script.

            Returns True and updates the cgi_info attribute to the tuple
            (dir, rest) if self.path requires running a CGI script.
            Returns False otherwise.

            If any exception is raised, the caller should assume that
            self.path was rejected as invalid and act accordingly.

            The default implementation tests whether the normalized url
            path begins with one of the strings in self.cgi_directories
            (and the next character is a '/' or the end of the string).

            """

            splitpath = server._url_collapse_path_split(self.path)
            # more processing required due to possible PATHINFO parts
            # not clear above function really does what is needed here,
            # nor just how general it is!
            splitpath = '/'.join( splitpath ).split('/', 2 )
            head = '/' + splitpath[ 1 ]
            tail = splitpath[ 2 ]
            if head in self.cgi_directories:
                self.cgi_info = head, tail
                return True
            return False

I have no idea what applications might depend on the improper handling of PATH_INFO that the current code is performing, so that is why I applied my fix for that in my overridden code, rather than in the server.py source file.

It may be that the actual fix for this issue is in the overridden code above (but the fix to _url_collapse_path_split also seemed necessary, there was a corner case that it did incorrectly, but after 16 months, I couldn't tell you what that corner case was, any more.

Yes, the biggest issue here was the regression from 2.6, the security fix seemed to break the PATH_INFO feature.
msg156013 - (view) Author: Glenn Linderman (v+python) * Date: 2012-03-16 08:31
Senthil, the patch you submitted breaks the encapsulation of cgi_directories -- this should be changeable per particular application via subclassing or assignment, but you've built those directories into the code.  Perhaps you should take a look at my code for is_cgi (which I was trying to post while you were posting the patches, sorry I was slow).

Perhaps there should be one more test, which actually changes the cgi_directories, and tests a few more things, to prove that the code does not hard code those directories (it shouldn't).
msg156014 - (view) Author: Glenn Linderman (v+python) * Date: 2012-03-16 08:43
Another issue with the patch, is that it doesn't do .. and . collapsing on the PATH_INFO part of the path.

It is possible for a path like

/cgi-bin/script.py/../../plain-file.html

to be passed to the server.  I guess the question is if it should serve plain-file.html or if it should pass "../../plain-file.html" to script.py as its PATH_INFO. I would think the former would be appropriate.  I would have to do research to determine if some standard states otherwise.
msg156015 - (view) Author: Glenn Linderman (v+python) * Date: 2012-03-16 08:56
I just tested what Apache does with such a path as /cgi-bin/script.py/../../plain-file.html, and it serves the plain-file.html.
msg156064 - (view) Author: Glenn Linderman (v+python) * Date: 2012-03-16 18:09
Senthil, from you patch, I discovered where the test_httpservers.py lives, and added 

            '/cgi-bin/file1.py/../../a': ('/', 'a'),

to a local copy, and it worked fine against whatever version of the server and tests it was running against... but that was in test_url_collapse_path_split ... I haven't figured out the full environment of the testing, or even if it launches an HTTP SERVER or only unittests particular functions within it, but it seems that that particular test is just testing the one function.  I'm not clear on where the tests actually test the activities of is_cgi or the creation of PATH_INFO, if that occurs. What time I've spent working with Python can be characterized as 99% application, 0.999% Python source to figure out how things work (or fail), .001% test environment (just now). So I'm pretty ignorant of the test environment, although I've spent 30+ years programming, only about 4 has been in Python, and that not full time.

It is not clear to me, though, that the test you added:

    '/cgi-bin/file1.py/PATH-INFO': ('/cgi-bin', 'file1.py/PATH-INFO'),

should be the expected results of _url_collapse_path_split... 

I would expect 

    '/cgi-bin/file1.py/a/b/c/../../d': ('/cgi-bin/file1.py/a', 'd'),

but I don't think it would, with your present patch. It looks like it would return ('/cgi-bin', 'file1.py/a/b/c/../../d') which reintroduces the security problem: the '..' stuff has not been processed. I haven't applied your patch to actually run it, this is just from inspection, and I may have missed something.
msg157977 - (view) Author: Roundup Robot (python-dev) Date: 2012-04-10 19:19
New changeset 89eeaa18700f by Senthil Kumaran in branch '2.7':
fix the incorrect changes made for PATH_INFO value - Issue10484
http://hg.python.org/cpython/rev/89eeaa18700f

New changeset 827a4062b1d6 by Senthil Kumaran in branch '3.2':
3.2- fix the incorrect changes made for PATH_INFO value - Issue10484
http://hg.python.org/cpython/rev/827a4062b1d6

New changeset 8bcc5768bc05 by Senthil Kumaran in branch 'default':
merge - fix the incorrect changes made for PATH_INFO value - Issue10484
http://hg.python.org/cpython/rev/8bcc5768bc05
msg157979 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-04-10 19:35
I had to carefully review a lot of changes for this. In addition, I did setup a apache and tested the behaviors too. 

Few notes - 
-  The test for _url_collapse_path_split is not directly helpful, especially for PATH_INFO. I removed my previous changes I had made as I think, it is a mistake to test PATH_INFO in there.
- There is not a test for path_info, it should be added. I could test only on local apache setup.
- Coming the changes itself, the previous _url_collapse_path_split was just fine for what it is supposed to do. Return head and tail for certain definitions of parsing (removing . and .. and tail is last part and rest is head).
- I incorporated Glenn Linderman logic in is_cgi to have explicit head and tail. Glenn's comment and cgi directories could overridden is important and in that ascept the previous commit had to be changed. I also looked at changes in the patches contained in issue 13893, but I found that those broke behavior or exhibited the security issue again.

taking care of all these, I have the made the changes to the cgi module and also verified pratically that PATH_INFO is indeed sent correctly. 

Ran python -m CGIHttpServer and placed cgi-bin/file1.py  and sent some requests to verify PATH_INFO. The values were proper.

With this, I think this issue can be closed. The rest of the cgi changes can be handled in their respective issues. A test to PATH_INFO could just be added.
msg158000 - (view) Author: Glenn Linderman (v+python) * Date: 2012-04-11 00:25
Senthil, thanks for your work on this.  Regarding:

I also looked at changes in the patches contained in issue 13893, but I found that those broke behavior or exhibited the security issue again.

I'd be curious to know what problems you discovered, since I had hoped I had fixed them all (and know I fixed enough to work in my environment).

I'll review your code in the next while, and attempt to include it in mine, and test it in my environment... I really don't want to maintain my own code, but do need functional code... hopefully once the other issues are fixed I can just use released code.
msg158010 - (view) Author: Glenn Linderman (v+python) * Date: 2012-04-11 06:35
A minor detail of efficiency:

_url_collapse_path_split is used only one place, from is_cgi.
_url_collapse_path_split splits the path, and is_cgi now puts it back together.  This seems inefficient.  Would it not be better for _url_collapse_path_split to be renamed to _url_collapse_path and simply return a single value, the path canonicalized (.. removed)?

Then is_cgi would not need to put it back together before taking it apart another way.
msg158015 - (view) Author: Glenn Linderman (v+python) * Date: 2012-04-11 09:30
I note that there is no test for tail_part == '.'.  I suggest adding a couple, such as the following which I added to my local copy for testing of the next item:

            '/a/b/.': ('/a/b', ''),
            '/a/b/c/../d/e/../../../../f/../.': ('/', ''),

Given your comment that you found bugs in my code, I figured out how to run the tests against my code, and found the bugs. Here is a slightly shorter, more efficient (it cuts 40% off the execution time, see time_test.py), and to me much clearer version of _url_collapse_path_split, and it passes all the tests:

def _url_collapse_path_split(path):
    # Similar to os.path.split(os.path.normpath(path)) but specific to URL
    # path semantics rather than local operating system semantics.
    path_parts = path.split('/')
    head_parts = []
    for part in path_parts[:-1]:
        if part == '..':
            head_parts.pop() # IndexError if more '..' than prior parts
        elif part and part != '.':
            head_parts.append( part )
    if path_parts:
        tail_part = path_parts.pop()
        if tail_part:
            if tail_part == '..':
                head_parts.pop()
                tail_part = ''
            elif tail_part == '.':
                tail_part = ''
    else:
        tail_part = ''        
    return ('/' + '/'.join(head_parts), tail_part)
msg158067 - (view) Author: Roundup Robot (python-dev) Date: 2012-04-11 18:37
New changeset c67efb8ffca4 by Senthil Kumaran in branch '2.7':
Issue 10484 - Incorporate improvements to CGI module - Suggested by Glenn Linderman. Refactor code and tests
http://hg.python.org/cpython/rev/c67efb8ffca4

New changeset fc001124a3ee by Senthil Kumaran in branch '3.2':
3.2 - Issue 10484 - Incorporate improvements to CGI module - Suggested by Glenn Linderman. Refactor code and tests
http://hg.python.org/cpython/rev/fc001124a3ee

New changeset 23f648d7053b by Senthil Kumaran in branch 'default':
merge to default - Issue 10484 - Incorporate improvements to CGI module - Suggested by Glenn Linderman. Refactor code and tests
http://hg.python.org/cpython/rev/23f648d7053b
msg158068 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-04-11 18:39
Hello Glenn, 

Your suggestions were valid. I have made those changes in the code. Thanks! I think, we can close this bug now and move over to others.

Thanks,
Senthil
History
Date User Action Args
2012-04-11 18:39:10orsenthilsetstatus: open -> closed

messages: + msg158068
2012-04-11 18:37:43python-devsetmessages: + msg158067
2012-04-11 09:30:13v+pythonsetfiles: + time_test.py

messages: + msg158015
2012-04-11 06:35:23v+pythonsetmessages: + msg158010
2012-04-11 00:25:15v+pythonsetmessages: + msg158000
2012-04-10 19:35:27orsenthilsetfiles: + file1.py

messages: + msg157979
2012-04-10 19:19:12python-devsetmessages: + msg157977
2012-03-16 18:09:56v+pythonsetmessages: + msg156064
2012-03-16 08:56:55v+pythonsetmessages: + msg156015
2012-03-16 08:43:20v+pythonsetmessages: + msg156014
2012-03-16 08:31:28v+pythonsetstatus: closed -> open

messages: + msg156013
2012-03-16 08:25:15v+pythonsetmessages: + msg156012
2012-03-16 08:18:19orsenthilsetmessages: + msg156011
2012-03-16 08:15:34python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg156010

resolution: fixed
stage: resolved
2012-03-16 07:51:40v+pythonsetmessages: + msg156007
2012-03-16 07:01:22orsenthilsetmessages: + msg156001
2012-03-16 06:23:33orsenthilsetmessages: + msg155996
2012-03-02 01:00:12v+pythonsetmessages: + msg154720
2012-03-02 00:05:10orsenthilsetassignee: orsenthil
messages: + msg154719
2012-03-01 23:51:13Giovanni.Funchalsetnosy: + Giovanni.Funchal
messages: + msg154718
2010-11-24 03:50:32v+pythonsetmessages: + msg122259
2010-11-21 16:59:36pitrousetnosy: + fdrake, facundobatista, orsenthil

versions: + Python 3.1
2010-11-21 07:31:56v+pythoncreate