classification
Title: http.server doesn't set all CGI environment variables
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.10
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: demian.brecht, facundobatista, fdrake, maarten, martin.panter, orsenthil, quentel, remi.lapeyre, v+python
Priority: normal Keywords: patch

Created on 2010-11-21 07:41 by v+python, last changed 2020-12-01 22:40 by orsenthil. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 23604 merged orsenthil, 2020-12-01 22:29
Messages (13)
msg121878 - (view) Author: Glenn Linderman (v+python) * Date: 2010-11-21 07:41
HTTP_HOST HTTP_PORT REQUEST_URI are variables that my CGI scripts use, but which are not available from http.server or CGIHTTPServer (until I added them).

There may be more standard variables that are not set, I didn't attempt to enumerate the whole list.
msg122258 - (view) Author: Glenn Linderman (v+python) * Date: 2010-11-24 03:41
Took a little more time to do a little more analysis on this one.  Compared a sample query via Apache on Linux vs http.server, then looked up the CGI RFC for more info:

DOCUMENT_ROOT: ...
GATEWAY_INTERFACE: CGI/1.1
HTTP_ACCEPT: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
HTTP_ACCEPT_CHARSET: ISO-8859-1,utf-8;q=0.7,*;q=0.7
HTTP_ACCEPT_ENCODING: gzip,deflate
HTTP_ACCEPT_LANGUAGE: en-us,en;q=0.5
HTTP_CONNECTION: keep-alive
HTTP_COOKIE: ...
HTTP_HOST: ...
HTTP_KEEP_ALIVE: 115
HTTP_USER_AGENT: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10
PATH: /usr/local/bin:/usr/bin:/bin
PATH_INFO: ...
PATH_TRANSLATED: ...
QUERY_STRING: 
REMOTE_ADDR: 173.75.100.22
REMOTE_PORT: 50478
REQUEST_METHOD: GET
REQUEST_URI: ...
SCRIPT_FILENAME: ...
SCRIPT_NAME: ...
SERVER_ADDR: ...
SERVER_ADMIN: ...
SERVER_NAME: ...
SERVER_PORT: ...
SERVER_PROTOCOL: HTTP/1.1
SERVER_SIGNATURE: <address>Apache Server at rkivs.com Port 80</address>

SERVER_SOFTWARE: Apache
UNIQUE_ID: TLEs8krc24oAABQ1TIUAAAPN

Above from Apache, below from http.server

GATEWAY_INTERFACE: CGI/1.1
HTTP_USER_AGENT: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12
PATH_INFO: ...
PATH_TRANSLATED: ...
QUERY_STRING: ...
REMOTE_ADDR: 127.0.0.1
REQUEST_METHOD: GET
SCRIPT_NAME: ...
SERVER_NAME: ...
SERVER_PORT: ...
SERVER_PROTOCOL: HTTP/1.0
SERVER_SOFTWARE: SimpleHTTP/0.6 Python/3.2a4

Analysis of missing variables between Apache and http.server:

DOCUMENT_ROOT
HTTP_ACCEPT
HTTP_ACCEPT_CHARSET
HTTP_ACCEPT_ENCODING
HTTP_ACCEPT_LANGUAGE
HTTP_CONNECTION
HTTP_COOKIE
HTTP_HOST
HTTP_KEEP_ALIVE
HTTP_PORT
PATH
REQUEST_URI
SCRIPT_FILENAME
SERVER_ADDR
SERVER_ADMIN


Additional variables mentioned in RFC 3875, not used for my test requests:

AUTH_TYPE
CONTENT_LENGTH
CONTENT_TYPE
REMOTE_IDENT
REMOTE_USER
msg158532 - (view) Author: Glenn Linderman (v+python) * Date: 2012-04-17 06:05
Reading the CGI 1.1 spec, it says:

   The QUERY_STRING value provides the query-string part of the
   Script-URI.  (See section 3.3).

   The server MUST set this variable; if the Script-URI does not include
   a query component, the QUERY_STRING MUST be defined as an empty
   string ("").

Therefore the code in run_cgi that says:

        if query:
            env['QUERY_STRING'] = query

should have the conditional removed.
msg235797 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-12 04:58
Issue 5054 is for HTTP_ACCEPT
msg329766 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-11-12 22:10
The reference given in https://github.com/python/cpython/blob/b36b0a3765bcacb4dcdbf12060e9e99711855da8/Lib/http/server.py#L1074 is not accessible anymore.

I think we should replace it by https://tools.ietf.org/html/rfc3875#section-4.1
msg329771 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-11-12 23:10
AUTH_TYPE, CONTENT_LENGTH, CONTENT_TYPE, REMOTE_USER are present

REMOTE_IDENT is not but I'm not sure it's worth adding.

I can send a PR to add REMOTE_HOST and remove the condition for QUERY_STRING.

Otherwise, I don't think the other environment variables should be added, they are implementation dependant and not defined in RFC 3875.

Should we close this issue?
msg329815 - (view) Author: Glenn Linderman (v+python) * Date: 2018-11-13 08:11
Rémi Lapeyre, glad to see your interest here, as this is an old and languishing bug.

I would have hoped based on my input, that had there been anyone that was maintaining the Python web server code, that they might have done a more complete analysis than I did.

I note the document you reference is from 2004 (and I referenced it too), and doesn't include mention of the HTTP_COOKIE header, yet that header is frequently used in practical web applications. Apache supports it (as noted). My point is that it is not clear that conforming to the RFC 3875 from 2004 is really sufficient to build a useful web server. While it is true that my references to Apache are to a particular implementation, it is a widespread implementation, which other implementations attempt to be compatible with, indicating that being reasonably compatible with Apache would seem to be a good thing for other web server implementations. A few more environment variables don't cost a lot, and seem to be useful. I don't know if some or all of the additional environment variables implemented by Apache are standardized by RFC or other standards, or whether they are common practice, or unique to Apache. Nor where such standards might be fonud, but I would hope a maintainer of the Python web server would be interested in sorting out such environment variables and making that determination, rather than relying on a 14 year old RFC as the definitive source, when web technologies have progressed significantly in the last 14 years. I would agree that variables that are unique to Apache might not want to be implemented, but on the other hand, with other implementations following Apache's lead, there may be few that are unique to Apache.
msg329822 - (view) Author: Rémi Lapeyre (remi.lapeyre) * Date: 2018-11-13 10:11
Hi Glenn, I'm not aware of a document that defines CGI better than the RFC and I don't know it enough to disgress from the published standard (even if it is not what isdone today as I don't know the current practices enough).

Here is the variables defined by nginx 1.10.3 on Debian:

    fastcgi_param  QUERY_STRING       $query_string;
    fastcgi_param  REQUEST_METHOD     $request_method;
    fastcgi_param  CONTENT_TYPE       $content_type;
    fastcgi_param  CONTENT_LENGTH     $content_length;

    fastcgi_param  SCRIPT_NAME        $fastcgi_script_name;
    fastcgi_param  REQUEST_URI        $request_uri;
    fastcgi_param  DOCUMENT_URI       $document_uri;
    fastcgi_param  DOCUMENT_ROOT      $document_root;
    fastcgi_param  SERVER_PROTOCOL    $server_protocol;
    fastcgi_param  REQUEST_SCHEME     $scheme;
    fastcgi_param  HTTPS              $https if_not_empty;

    fastcgi_param  GATEWAY_INTERFACE  CGI/1.1;
    fastcgi_param  SERVER_SOFTWARE    nginx/$nginx_version;

    fastcgi_param  REMOTE_ADDR        $remote_addr;
    fastcgi_param  REMOTE_PORT        $remote_port;
    fastcgi_param  SERVER_ADDR        $server_addr;
    fastcgi_param  SERVER_PORT        $server_port;
    fastcgi_param  SERVER_NAME        $server_name;

    # PHP only, required if PHP was built with --enable-force-cgi-redirect
    fastcgi_param  REDIRECT_STATUS    200;

Someone that knows CGI better than me may know the way forward
msg329851 - (view) Author: Pierre Quentel (quentel) * Date: 2018-11-13 15:29
The QUERY_STRING value is always set by the code at lines 1135-1137 of http.server:

    for k in ('QUERY_STRING', 'REMOTE_HOST', 'CONTENT_LENGTH',
              'HTTP_USER_AGENT', 'HTTP_COOKIE', 'HTTP_REFERER'):
        env.setdefault(k, "")

The RFC for CGI has not evolved since 2004, probably because the technology is stable, and also because other, more efficient protocols have been defined to avoid the "CGI overhead" (FastCGI for instance).

I think that http.server should only implement the "meta-variables" defined in RFC 3875:
- AUTH_TYPE  CONTENT_LENGTH CONTENT_TYPE  GATEWAY_INTERFACE PATH_INFO  PATH_TRANSLATED QUERY_STRING  REMOTE_ADDR REMOTE_HOST  REMOTE_IDENT REMOTE_USER  REQUEST_METHOD SCRIPT_NAME  SERVER_NAME SERVER_PORT  SERVER_PROTOCOL SERVER_SOFTWARE. Some of these must always be set (eg QUERY_STRING, REQUEST_METHOD, SERVER_NAME...) but for other ones, there are conditions (for instance for CONTENT_LENGTH: "The server MUST set this meta-variable if and only if the request is accompanied by a message-body entity")
- "protocol-specific meta variables" : for HTTP, variables determined by the HTTP request headers such as HTTP_COOKIE (cf section 4.1.18.  Protocol-Specific Meta-Variables)

Other meta variables are probably beyond the scope of a module in the standard library.

In short, in my opinion the issue can be closed.
msg329863 - (view) Author: Glenn Linderman (v+python) * Date: 2018-11-13 19:37
That's interesting, Pierre, I hadn't really read the RFC carefully, to realize that many of the "missing" variables from Apache are HTTP headers, and that section 4.1.18 tell how to convert HTTP headers to meta variables.

The code in server.py 3.6 (Sorry, I should check the master branch) picks specific HTTP_ headers to include, rather than including them all per the rules. Doing the latter would go a long way toward being more compatible with Apache. I don't know if Rémi got his NGINX list from source code (looks like it) and if maybe NGINX also defines meta variables from the HTTP_ headers, that are not listed in the header file he seems to be quoting.

Unless the code has already been improved for Python 3.7, I think there is still some work to do to make server.py conform even to the RFC, if not be compatible with Apache.
msg375207 - (view) Author: Maarten (maarten) * Date: 2020-08-12 02:54
The CGI examples of urwid (see http://urwid.org/manual/displaymodules.html#cgi-web-display-module-web-display) don't work on http.server because of missing meta variables.

Using cgitb, I found out that the webdriver expects the environment variable `HTTP_X_URWID_METHOD` to be set. The javascript sets the "X-Urwid-Method" header (using XmlHttpRequest), but these are not visible by the CGI python script.

So some scripts extra Meta-Variables neet to be set.

I think section 4.1.18 applied because it is a http header that is being set. The sections says that these meta-variables are optional though.

I argue that having access to extra headers is useful.
msg375801 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2020-08-22 14:36
Hello Maarten,

> Using cgitb, I found out that the webdriver expects the environment variable `HTTP_X_URWID_METHOD` to be set. The javascript sets the "X-Urwid-Method" header (using XmlHttpRequest), but these are not visible by the CGI python script.

> So some scripts extra Meta-Variables neet to be set

Thanks for your comment on this old issue. The topic under discussion was about some existing "more standard" CGI variables than special meta variables. 

Even if the first standard CGI variables issue get exposed, I doubt the meta variables will get added. I will think about considering the minimal change required to accomplish the task and close the issue.
msg382280 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2020-12-01 22:40
I spent some time reviewing and researching the specification. It also says

   The server is not required to create meta-variables for all the
   header fields that it receives.  

And this in issue, open since 2010, we have issue two different set of variables one from Apache and from Nginx. So, it Is not certain if http.server should be alinged it any or all, and plus if anything is required.

The discussion on QUERY_STRING was noted, but as Pierre pointed out it was set too

        for k in ('QUERY_STRING', 'REMOTE_HOST', 'CONTENT_LENGTH',
                  'HTTP_USER_AGENT', 'HTTP_COOKIE', 'HTTP_REFERER'):
            env.setdefault(k, "")

For cosmetic purpose, I could remove the existing if condition - https://github.com/python/cpython/pull/23604

I am not sure if we need to add other variables with an empty string value for any reason. 

As a maintainer, I think, we should close this issue.

If there is a bug report, like issue5054, then that is a valid issue, and we should fix it.  If there any specific issues raised with parsing or lack of "required" meta variable that caused the application to break, even that could be fixed.

I am closing this issue with a cosmetic change that stemmed out from the discussion - https://github.com/python/cpython/pull/23604
History
Date User Action Args
2020-12-01 22:40:05orsenthilsetstatus: open -> closed
versions: + Python 3.10, - Python 2.7, Python 3.2, Python 3.3
messages: + msg382280

resolution: wont fix
stage: patch review -> resolved
2020-12-01 22:29:09orsenthilsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request22473
2020-08-22 14:36:23orsenthilsetmessages: + msg375801
stage: needs patch
2020-08-12 02:54:55maartensetnosy: + maarten
messages: + msg375207
2018-11-13 19:37:32v+pythonsetmessages: + msg329863
2018-11-13 15:29:05quentelsetmessages: + msg329851
2018-11-13 10:11:44remi.lapeyresetmessages: + msg329822
2018-11-13 08:11:29v+pythonsetmessages: + msg329815
2018-11-12 23:10:38remi.lapeyresetmessages: + msg329771
2018-11-12 22:10:59remi.lapeyresetmessages: + msg329766
2018-11-10 21:33:11quentelsetnosy: + quentel
2018-11-10 09:50:34remi.lapeyresetnosy: + remi.lapeyre
2015-02-12 16:26:39demian.brechtsetnosy: + demian.brecht
2015-02-12 04:58:47martin.pantersetnosy: + martin.panter
messages: + msg235797
2012-08-12 12:48:16berker.peksagsetversions: + Python 3.3, - Python 2.6, Python 3.1
2012-04-17 06:05:07v+pythonsetmessages: + msg158532
2011-03-18 02:08:24orsenthilsetassignee: orsenthil
nosy: fdrake, facundobatista, orsenthil, v+python
2010-11-24 03:41:16v+pythonsetmessages: + msg122258
2010-11-21 16:58:47pitrousetnosy: + fdrake, facundobatista, orsenthil
2010-11-21 07:41:33v+pythoncreate