This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Allow multilevel subdirectories in cgi_directories
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: martin.panter, orsenthil, oscar.campos, quentel, v+python
Priority: normal Keywords: patch

Created on 2012-04-12 18:21 by v+python, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
CGIHTTPServer.patch oscar.campos, 2012-08-04 11:49 Patch for CGIHTTPServer - Py2.7
http-server.patch oscar.campos, 2012-08-04 11:50 Patch for http.server Py3.2
http-server.patch oscar.campos, 2012-08-04 13:01 Patch for http.server (default branch) review
Messages (12)
msg158158 - (view) Author: Glenn Linderman (v+python) * Date: 2012-04-12 18:21
I notice a deficiency in is_cgi: there is no documentation requiring cgi_directories to be a single part, only that the initial value happens to be a list of two directories, each of which have only a single part or level.  The description of is_cgi, however, only requires that the strings in self.cgi_directories be prefixes of self.path, followed by "/" or end of string. While it is not at all clear that being followed by end of string would produce useful results, the description does allow for multiple parts in the directory, but the implementation does not.

Consider a potential value in an overridden or augmented cgi_directories such as '/subdomain/cgi-bin'.  The current is_cgi wouldn't handle that.

Solution: replace the following is_cgi code (from current trunk):

        collapsed_path = _url_collapse_path(self.path)
        dir_sep = collapsed_path.find('/', 1)
        head, tail = collapsed_path[:dir_sep], collapsed_path[dir_sep+1:]
        if head in self.cgi_directories:
            self.cgi_info = head, tail
            return True
        return False

with:

        cln = len( collapsed_path )
        found = False
        for ix in self.cgi_directories:
            ln = len( ix )
            print('is_cgi: %d %d - %s - %s'
                  % ( ln, cln, ix, collapsed_path ))
            if ln == cln  and  ix == collapsed_path:
                self.cgi_info = ( ix, '')
                found = True
                break
            elif ( ln < cln  and  collapsed_path[ ln ] == '/'
                   and  collapsed_path.startswith( ix )):
                self.cgi_info = ( ix, collapsed_path[ ln+1: ])
                found = True
                break
        return found
msg158163 - (view) Author: Glenn Linderman (v+python) * Date: 2012-04-12 19:02
Happily, this can be cured by overriding and replacing is_cgi, but it shouldn't be necessary to do so.
msg160062 - (view) Author: Pierre Quentel (quentel) * Date: 2012-05-06 07:46
Hi Glenn, good to hear from you ;-)

I think the fix can be simplified replacing

dir_sep = collapsed_path.find('/', 1)

by

dir_sep = collapsed_path.rfind('/', 1)
msg160111 - (view) Author: Glenn Linderman (v+python) * Date: 2012-05-06 21:57
Hi Pierre, Nice to see your name pop up again.

Your suggestion is certainly simpler... but unfortunately, too much simpler.  One reason, is that in configuring a path to contain CGI files, the CGI files are allowed to be there, or anywhere deeper in the tree.  So configuring "/http/bin" should allow the CGI file to be any of:

/http/bin/some.cgi
/http/bin/application1/other.cgi
/http/bin/app2/subapp/third.cgi

and many more.

The second reason, is that once /http/bin is determined to be the path prefix, then the search down the tree is done for the cgi file.  Now let's say URL looks like:

http://server.com/http/bin/app2/subapp/third.cgi/more/path/info/params

It is appropriate to find third.cgi and execute it as a CGI file, and pass it as PATHINFO /more/path/info/params !  Yet, using your suggested over-simplification, "params" would be considered the CGI file (if it even exists), and /http/bin/app2/subapp/third.cgi/more/path/info certainly wouldn't exactly match /http/bin... so neither params nor third.cgi would get a chance to be executed.
msg160119 - (view) Author: Pierre Quentel (quentel) * Date: 2012-05-07 06:14
Thanks for the explanation

I still think that the patch can be simplified, not using path lengths and the "found" flag

collapsed_path = _url_collapse_path(self.path)
for head in self.cgi_directories:
    if head==collapsed_path:
        self.cgi_info = (head,'')
        return True
    elif collapsed_path.startswith(head) \
        and collapsed_path[len(head)]=='/':
            self.cgi_info = head, collapsed_path[len(head)+1:]
            return True
return False

BTW the last "return False" is rather useless since is_cgi() is only used in tests like "if is_cgi()"
msg160169 - (view) Author: Glenn Linderman (v+python) * Date: 2012-05-07 21:15
Hi Pierre,

You are right, the "found" variable is not needed, I guess the reason I coded it that way, is I had some validation code before the return, during testing, so it was easier not to have it in three places.

Regarding precalculating ln, I don't know how Python is coded internally, so it seems that string compares could be lengthier integer compares, and even in your code the length is calculated sometimes.  So it is mostly an order of operations optimization... I didn't attempt to benchmark it with various URL and cgi_directories values, including at least both some with large prefix matches, and some without... I didn't figure it would matter much, but if you think it does, then feel free to benchmark.
msg160187 - (view) Author: Pierre Quentel (quentel) * Date: 2012-05-08 06:22
Hi Glenn,

My proposal was not about optimization, I just thought that "if x==y" is simpler than "if len(x)==len(y) and x==y". Since we don't expect that there will be many directories in the list, I don't think optimizing is so important. But it doesn't matter to me really, the most important is to have the bug fixed

Can you propose a diff file so that the committers can review it ?
msg167402 - (view) Author: Oscar Campos (oscar.campos) * Date: 2012-08-04 11:49
Greetings,

I did a diff patch based on the previous work of Glenn Linderman and Pierre Quentel.

I did some test and is working well so now the implementation is doing what the docstring says. I already passed the full test suite without problems.

I add two patches:
CGIHTTPServer.patch is for Python 2.7
http-server.patch is for Python 3.2

This is my first contribution to the Python Core althrough I did follow the Developers Gruide and I already send my Contributor Agreement to the PSF maybe there is something I did wrong, if this is the case just tell me, I'm here to help and learn.

Regards.
Oscar Campos.
msg167420 - (view) Author: Glenn Linderman (v+python) * Date: 2012-08-04 16:52
Thanks for the patch, Oscar, I've not had more time to follow up on this issue, and haven't yet learned how to generate the patches.

While you dropped the "return False" line, which surprised me, the implicit "return None" is sufficiently false, that there no real concern with the correctness of the code in the patch.

Hopefully, with your contribution, this issue can progress to completion.
msg252076 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-10-02 01:02
This smells like a new feature to me, because it is clearly not supported by the existing code. If you wanted to fix the existing released versions of Python, I think it would be better to clarify the documentation.
msg252175 - (view) Author: Glenn Linderman (v+python) * Date: 2015-10-02 23:10
Martin, thanks for the recent activity in this area. Sorry I've not yet learned how to submit patches. python_dev keeps busy changing the process and tools, and I keep busy with other projects, having patched a version of http-server well enough for my personal needs.

Your analysis that this is a new feature because the code doesn't support it, and the documentation is vague has some merit. 

On the other hand, the reason I called it a behavior bug in the first place is comparison to other web servers that support multi-part paths to cgi scripts indicates it is clearly deficient by comparison... and the documentation isn't explicit enough to say so, so it can be considered a "deficient" if not "buggy" implementation. Certainly, any attempt to port a real web site that depends heavily on cgi scripts to http-server would be extremely likely to fail due to this deficiency, that is not documented as a limitation of the "generally expected" abilities of a web server.

It'd sure be nice to fix it, whether it is called a bug or an enhancement, and the fix is in the comments above.
msg252206 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-10-03 11:46
The patch looks like it also adds support for individual script file names in cgi_directories. If that is the case, I suspect it would be tricked by a query string (e.g. /some.cgi?query).

I object to removing the final “return False”. The return value of is_cgi() is significant, and it good to be explicit about what it is returning.

It would be helpful to get a test case. Or multiple test cases if I am right about the individual file support.

And the documentation needs updating saying what is supported, with a “Version changed” notice. It would also be good to be specific about the syntax for cgi_directories. E.g. must start with a slash, cannot use Windows backslashes, must be a normalized path, not URL encoded. Is the root directory allowed?
History
Date User Action Args
2022-04-11 14:57:29adminsetgithub: 58770
2015-10-03 11:46:04martin.pantersetmessages: + msg252206
stage: patch review
2015-10-02 23:10:10v+pythonsetmessages: + msg252175
2015-10-02 01:02:47martin.pantersetnosy: + martin.panter
title: is_cgi doesn't function as documented for cgi_directories -> Allow multilevel subdirectories in cgi_directories
messages: + msg252076

versions: + Python 3.6, - Python 2.7, Python 3.2, Python 3.3
type: behavior -> enhancement
2012-08-04 16:52:43v+pythonsetmessages: + msg167420
2012-08-04 13:01:52oscar.campossetfiles: + http-server.patch
2012-08-04 11:50:10oscar.campossetfiles: + http-server.patch
2012-08-04 11:49:25oscar.campossetfiles: + CGIHTTPServer.patch

nosy: + oscar.campos
messages: + msg167402

keywords: + patch
2012-05-08 06:22:48quentelsetmessages: + msg160187
2012-05-08 03:37:59eric.araujosetversions: - Python 2.6, Python 3.1
2012-05-07 21:15:35v+pythonsetmessages: + msg160169
2012-05-07 06:14:13quentelsetmessages: + msg160119
2012-05-06 21:57:49v+pythonsetmessages: + msg160111
2012-05-06 07:46:11quentelsetnosy: + quentel
messages: + msg160062
2012-04-12 19:02:58v+pythonsettype: behavior
messages: + msg158163
components: + Library (Lib)
versions: + Python 2.6, Python 3.1, Python 2.7, Python 3.2, Python 3.3
2012-04-12 18:23:14v+pythonsetnosy: + orsenthil
2012-04-12 18:21:06v+pythoncreate