Message 343942 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tburke
Recipients	tburke
Date	2019-05-29.23:32:08
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1559172729.29.0.908018369806.issue37093@roundup.psfhosted.org>
In-reply-to

Content
First, spin up a fairly trivial http server: import wsgiref.simple_server def app(environ, start_response): start_response('200 OK', [ ('Some-Canonical', 'headers'), ('sOme-CRAzY', 'hEaDERs'), ('Utf-8-Values', '\xe2\x9c\x94'), ('s\xc3\xb6me-UT\xc6\x92-8', 'in the header name'), ('some-other', 'random headers'), ]) return [b'Hello, world!\n'] if __name__ == '__main__': httpd = wsgiref.simple_server.make_server('', 8000, app) while True: httpd.handle_request() Note that this code works equally well on py2 or py3; the interesting bytes on the wire are the same on either. Verify the expected response using an independent tool such as curl: $ curl -v http://localhost:8000 * Trying ::1... * TCP_NODELAY set * connect to ::1 port 8000 failed: Connection refused * Trying 127.0.0.1... * TCP_NODELAY set * Connected to localhost (127.0.0.1) port 8000 (#0) > GET / HTTP/1.1 > Host: localhost:8000 > User-Agent: curl/7.64.0 > Accept: / > * HTTP 1.0, assume close after body < HTTP/1.0 200 OK < Date: Wed, 29 May 2019 23:02:37 GMT < Server: WSGIServer/0.2 CPython/3.7.3 < Some-Canonical: headers < sOme-CRAzY: hEaDERs < Utf-8-Values: ✔ < söme-UTƒ-8: in the header name < some-other: random headers < Content-Length: 14 < Hello, world! * Closing connection 0 Check that py2 includes all the same headers: $ python2 -c 'import pprint, urllib; resp = urllib.urlopen("http://localhost:8000"); pprint.pprint((dict(resp.info().items()), resp.read()))' ({'content-length': '14', 'date': 'Wed, 29 May 2019 23:03:02 GMT', 'server': 'WSGIServer/0.2 CPython/3.7.3', 'some-canonical': 'headers', 'some-crazy': 'hEaDERs', 'some-other': 'random headers', 's\xc3\xb6me-ut\xc6\x92-8': 'in the header name', 'utf-8-values': '\xe2\x9c\x94'}, 'Hello, world!\n') But py3 does not: $ python3 -c 'import pprint, urllib.request; resp = urllib.request.urlopen("http://localhost:8000"); pprint.pprint((dict(resp.info().items()), resp.read()))' ({'Date': 'Wed, 29 May 2019 23:04:09 GMT', 'Server': 'WSGIServer/0.2 CPython/3.7.3', 'Some-Canonical': 'headers', 'Utf-8-Values': 'â\x9c\x94', 'sOme-CRAzY': 'hEaDERs'}, b'Hello, world!\n') Instead, it is missing the first header that has a non-ASCII name as well as all subsequent headers (even if they are all-ASCII). Interestingly, the response body is intact. This is eventually traced back to email.feedparser's expectation that all headers conform to rfc822 and its assumption that anything that doesn't conform must be part of the body: https://github.com/python/cpython/blob/v3.7.3/Lib/email/feedparser.py#L228-L236 However, http.client has already determined the boundary between headers and body in parse_headers, and sent everything that it thinks is headers to the parser: https://github.com/python/cpython/blob/v3.7.3/Lib/http/client.py#L193-L214

First, spin up a fairly trivial http server:

    import wsgiref.simple_server
    
    def app(environ, start_response):
        start_response('200 OK', [
            ('Some-Canonical', 'headers'),
            ('sOme-CRAzY', 'hEaDERs'),
            ('Utf-8-Values', '\xe2\x9c\x94'),
            ('s\xc3\xb6me-UT\xc6\x92-8', 'in the header name'),
            ('some-other', 'random headers'),
        ])
        return [b'Hello, world!\n']
    
    if __name__ == '__main__':
        httpd = wsgiref.simple_server.make_server('', 8000, app)
        while True:
            httpd.handle_request()

Note that this code works equally well on py2 or py3; the interesting bytes on the wire are the same on either.

Verify the expected response using an independent tool such as curl:

    $ curl -v http://localhost:8000
    *   Trying ::1...
    * TCP_NODELAY set
    * connect to ::1 port 8000 failed: Connection refused
    *   Trying 127.0.0.1...
    * TCP_NODELAY set
    * Connected to localhost (127.0.0.1) port 8000 (#0)
    > GET / HTTP/1.1
    > Host: localhost:8000
    > User-Agent: curl/7.64.0
    > Accept: */*
    > 
    * HTTP 1.0, assume close after body
    < HTTP/1.0 200 OK
    < Date: Wed, 29 May 2019 23:02:37 GMT
    < Server: WSGIServer/0.2 CPython/3.7.3
    < Some-Canonical: headers
    < sOme-CRAzY: hEaDERs
    < Utf-8-Values: ✔
    < söme-UTƒ-8: in the header name
    < some-other: random headers
    < Content-Length: 14
    < 
    Hello, world!
    * Closing connection 0

Check that py2 includes all the same headers:

    $ python2 -c 'import pprint, urllib; resp = urllib.urlopen("http://localhost:8000"); pprint.pprint((dict(resp.info().items()), resp.read()))'
    ({'content-length': '14',
      'date': 'Wed, 29 May 2019 23:03:02 GMT',
      'server': 'WSGIServer/0.2 CPython/3.7.3',
      'some-canonical': 'headers',
      'some-crazy': 'hEaDERs',
      'some-other': 'random headers',
      's\xc3\xb6me-ut\xc6\x92-8': 'in the header name',
      'utf-8-values': '\xe2\x9c\x94'},
     'Hello, world!\n')

But py3 *does not*:

    $ python3 -c 'import pprint, urllib.request; resp = urllib.request.urlopen("http://localhost:8000"); pprint.pprint((dict(resp.info().items()), resp.read()))'
    ({'Date': 'Wed, 29 May 2019 23:04:09 GMT',
      'Server': 'WSGIServer/0.2 CPython/3.7.3',
      'Some-Canonical': 'headers',
      'Utf-8-Values': 'â\x9c\x94',
      'sOme-CRAzY': 'hEaDERs'},
     b'Hello, world!\n')

Instead, it is missing the first header that has a non-ASCII name as well as all subsequent headers (even if they are all-ASCII). Interestingly, the response body is intact.

This is eventually traced back to email.feedparser's expectation that all headers conform to rfc822 and its assumption that anything that *doesn't* conform must be part of the body: https://github.com/python/cpython/blob/v3.7.3/Lib/email/feedparser.py#L228-L236

However, http.client has *already* determined the boundary between headers and body in parse_headers, and sent everything that it thinks is headers to the parser: https://github.com/python/cpython/blob/v3.7.3/Lib/http/client.py#L193-L214

History
Date	User	Action	Args
2019-05-29 23:32:09	tburke	set	recipients: + tburke
2019-05-29 23:32:09	tburke	set	messageid: <1559172729.29.0.908018369806.issue37093@roundup.psfhosted.org>
2019-05-29 23:32:09	tburke	link	issue37093 messages
2019-05-29 23:32:08	tburke	create