Author tburke
Recipients tburke
Date 2019-05-29.23:32:08
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1559172729.29.0.908018369806.issue37093@roundup.psfhosted.org>
In-reply-to
Content
First, spin up a fairly trivial http server:

    import wsgiref.simple_server
    
    def app(environ, start_response):
        start_response('200 OK', [
            ('Some-Canonical', 'headers'),
            ('sOme-CRAzY', 'hEaDERs'),
            ('Utf-8-Values', '\xe2\x9c\x94'),
            ('s\xc3\xb6me-UT\xc6\x92-8', 'in the header name'),
            ('some-other', 'random headers'),
        ])
        return [b'Hello, world!\n']
    
    if __name__ == '__main__':
        httpd = wsgiref.simple_server.make_server('', 8000, app)
        while True:
            httpd.handle_request()

Note that this code works equally well on py2 or py3; the interesting bytes on the wire are the same on either.

Verify the expected response using an independent tool such as curl:

    $ curl -v http://localhost:8000
    *   Trying ::1...
    * TCP_NODELAY set
    * connect to ::1 port 8000 failed: Connection refused
    *   Trying 127.0.0.1...
    * TCP_NODELAY set
    * Connected to localhost (127.0.0.1) port 8000 (#0)
    > GET / HTTP/1.1
    > Host: localhost:8000
    > User-Agent: curl/7.64.0
    > Accept: */*
    > 
    * HTTP 1.0, assume close after body
    < HTTP/1.0 200 OK
    < Date: Wed, 29 May 2019 23:02:37 GMT
    < Server: WSGIServer/0.2 CPython/3.7.3
    < Some-Canonical: headers
    < sOme-CRAzY: hEaDERs
    < Utf-8-Values: ✔
    < söme-UTƒ-8: in the header name
    < some-other: random headers
    < Content-Length: 14
    < 
    Hello, world!
    * Closing connection 0

Check that py2 includes all the same headers:

    $ python2 -c 'import pprint, urllib; resp = urllib.urlopen("http://localhost:8000"); pprint.pprint((dict(resp.info().items()), resp.read()))'
    ({'content-length': '14',
      'date': 'Wed, 29 May 2019 23:03:02 GMT',
      'server': 'WSGIServer/0.2 CPython/3.7.3',
      'some-canonical': 'headers',
      'some-crazy': 'hEaDERs',
      'some-other': 'random headers',
      's\xc3\xb6me-ut\xc6\x92-8': 'in the header name',
      'utf-8-values': '\xe2\x9c\x94'},
     'Hello, world!\n')

But py3 *does not*:

    $ python3 -c 'import pprint, urllib.request; resp = urllib.request.urlopen("http://localhost:8000"); pprint.pprint((dict(resp.info().items()), resp.read()))'
    ({'Date': 'Wed, 29 May 2019 23:04:09 GMT',
      'Server': 'WSGIServer/0.2 CPython/3.7.3',
      'Some-Canonical': 'headers',
      'Utf-8-Values': 'â\x9c\x94',
      'sOme-CRAzY': 'hEaDERs'},
     b'Hello, world!\n')

Instead, it is missing the first header that has a non-ASCII name as well as all subsequent headers (even if they are all-ASCII). Interestingly, the response body is intact.

This is eventually traced back to email.feedparser's expectation that all headers conform to rfc822 and its assumption that anything that *doesn't* conform must be part of the body: https://github.com/python/cpython/blob/v3.7.3/Lib/email/feedparser.py#L228-L236

However, http.client has *already* determined the boundary between headers and body in parse_headers, and sent everything that it thinks is headers to the parser: https://github.com/python/cpython/blob/v3.7.3/Lib/http/client.py#L193-L214
History
Date User Action Args
2019-05-29 23:32:09tburkesetrecipients: + tburke
2019-05-29 23:32:09tburkesetmessageid: <1559172729.29.0.908018369806.issue37093@roundup.psfhosted.org>
2019-05-29 23:32:09tburkelinkissue37093 messages
2019-05-29 23:32:08tburkecreate