Author jmlp
Recipients jmlp, orsenthil, pitrou, serhiy.storchaka
Date 2018-06-02.15:10:40
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1527952240.7.0.592728768989.issue33663@psf.upfronthosting.co.za>
In-reply-to
Content
The exception is raised in the start_response function provided by web.py's WSGIGateway class in wsgiserver3.py:1997.

# According to PEP 3333, when using Python 3, the response status
# and headers must be bytes masquerading as unicode; that is, they
# must be of type "str" but are restricted to code points in the
# "latin-1" set.

Therefore, header values must be strings whenever start_response is called. WSGI servers must accumulate headers in some data structure and must call the supplied "start_response" function, when they have gathered all the headers and converted all the values to strings.

The fault I observed is not strictly speaking caused by a bug in Python lib "server.py". Rather, it is a component interaction failure caused by inadequately defined semantics. The interaction between web.py and server.py is quite complex, and no component is faulty when considered alone.

I explain:

Response and headers management in server.py is handled by 3 methods of class BaseHTTPRequestHandler:
- send_response : puts response in buffer
- send_header : converts to string and adds to buffer
    ("%s: %s\r\n" % (keyword, value)).encode('latin-1', 'strict'))
- end_headers : flushes buffer to socket

This implementation is correct even if send_header is called with an
int value.

Now, web.py's application.py defines a "wsgi(env, start_resp)" function, which gets plugged into the CherryPy WSGI HTTP server.

The server is an instance of class wsgiserver.CherryPyWSGIServer created in httpserver.py:169 (digging deeper, actually at line 195).
This server is implemented as a HTTPServer configured to use gateways of type class WSGIGateway_10 to handle requests.

A gateway is basically an instance of class initialized with a HTTPRequest instance, that has a "respond" method. Of course the WSGIGateway implements "respond" as described in the WSGI standard: it calls the WSGI-compliant web app, which is a function(environ, start_response(status, headers)) returning an iterator (for chunked HTTP responses). The start_response function provided by class WSGIGateway is where the failure occurs.

When the application calls web.py's app.run(), the function runwsgi in web.py's wsgi.py get called. This function determines if it gets request via CGI or directly. In my case it starts a HTTP server using web.py's runsimple function (file httpserver.py:158).

This function never returns, and runs the CherryPyWSGIServer, but it first wraps the wsgi function in two WGSI Middleware callables. Both are defined in web.py's httpserver.py file. The interesting one is StaticMiddleWare (line 281). Its role, is to hijack URLs starting with /static, as is the case with my missing CSS file. In order to serve those static resources quickly, its implementation uses StaticApp (a WSGI function serving static stuff, defined line 225), which extends Python's SimpleHTTPRequestHandler. That's where to two libraries connect.

StaticApp changes the way headers are processed using overloaded methods for send_response, send_header and end_headers. This means that, when StaticApp calls SimpleHTTPRequestHandler.send_head() to send the HEAD part of the response, the headers are managed using the overloaded methods. When send_head() finds out that my CSS file does not exist and calls send_error() a Content-Length header gets written, but it is not converted to string, because the overloaded implementation just stores the header name and value in a list as they come.

When it has finished gathering headers using Python's send_head(), it immediately calls start_response provided by WSGIGateway, where the failure occurs.

The bug in Python is not strictly that send_header gets called with an int in send_error. Rather, it is a documentation bug which fails to mention that send_header/end_headers MUST CONVERT TO STRING and ENCODE IN LATIN-1.

Therefore the correction I proposed is still invalid, because the combination of web.py and server.py after the correction, still does not properly encode the headers.

As a conclusion I would say that:
- In Python lib, the bug is a documentation bug, where documentation fails to indicate that send_headers and/or end_headers can receive header names or values which are not strings and not encoded in strict latin-1, and that it is their responsibility to do so.
- In Web.py because the implementation of the overloaded methods fails to properly encode the headers.

Of course, changing int to str does no harm and makes everything more resilient, but does not fix the underlying bug.
History
Date User Action Args
2018-06-02 15:10:40jmlpsetrecipients: + jmlp, orsenthil, pitrou, serhiy.storchaka
2018-06-02 15:10:40jmlpsetmessageid: <1527952240.7.0.592728768989.issue33663@psf.upfronthosting.co.za>
2018-06-02 15:10:40jmlplinkissue33663 messages
2018-06-02 15:10:40jmlpcreate