classification
Title: Urllib/Urlopen IncompleteRead with HTTP header with new line characters
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: httplib fails to handle semivalid HTTP headers
View: 24363
Assigned To: Nosy List: martin.panter, rugk
Priority: normal Keywords:

Created on 2016-06-11 14:04 by rugk, last changed 2016-06-11 17:55 by rugk. This issue is now closed.

Messages (3)
msg268212 - (view) Author: (rugk) Date: 2016-06-11 14:03
Test file: https://gist.github.com/rugk/3ea35d04d66c2295e02d0b6cb6d822a2
Python version: 2.7.5+

Issue description: When Urllib gets a HTTP header with line breaks/new line characters it shows the following error:

```
Traceback (most recent call last):
  File "./downloadtest.py", line 17, in <module>
    respdata = resp.read()
  File "/usr/lib/python2.7/socket.py", line 351, in read
    data = self._sock.recv(rbufsize)
  File "/usr/lib/python2.7/httplib.py", line 543, in read
    return self._read_chunked(amt)
  File "/usr/lib/python2.7/httplib.py", line 597, in _read_chunked
    raise IncompleteRead(''.join(value))
httplib.IncompleteRead: IncompleteRead(0 bytes read)
```

Compare the results with curl...

# Broken version

## curl

```
$curl -i https://rugk.dedyn.io/pythontest/bug
HTTP/1.1 200 OK
Server: nginx
Date: Sat, 11 Jun 2016 13:34:36 GMT
Content-Type: text/plain
Transfer-Encoding: chunked
Connection: keep-alive
Strict-Transport-Security: max-age=15768000; includeSubDomains; preload
Public-Key-Pins: 
pin-sha256="306cc4Cc2py0x48ZiX2G5vt5OxF9afmouqccrFqb8Jc=";
pin-sha256="dWkVtg0EuckExnceVFvu3tuEApEygbxr2FPTlpHAUrQ=";
pin-sha256="DjjVxb2/6kxfX8qyP2TE/j8B0tOB60MhTTvJdNsFPaU=";
max-age=5184000; includeSubDomains;
report-uri="https://rugkdyndns.report-uri.io/r/default/hpkp/enforce"

Bug: 
```

## python
```
$ ./downloadtest.py https://rugk.dedyn.io/pythontest/bug
Accessing https://rugk.dedyn.io/pythontest/bug...
Traceback (most recent call last):
  File "./downloadtest.py", line 17, in <module>
    respdata = resp.read()
  File "/usr/lib/python2.7/socket.py", line 351, in read
    data = self._sock.recv(rbufsize)
  File "/usr/lib/python2.7/httplib.py", line 543, in read
    return self._read_chunked(amt)
  File "/usr/lib/python2.7/httplib.py", line 597, in _read_chunked
    raise IncompleteRead(''.join(value))
httplib.IncompleteRead: IncompleteRead(0 bytes read)
```

# working version

## curl
```
$ curl -i https://rugk.dedyn.io/pythontest/works
HTTP/1.1 200 OK
Server: nginx
Date: Sat, 11 Jun 2016 13:46:09 GMT
Content-Type: text/plain
Transfer-Encoding: chunked
Connection: keep-alive
Strict-Transport-Security: max-age=15768000; includeSubDomains; preload
Public-Key-Pins: pin-sha256="306cc4Cc2py0x48ZiX2G5vt5OxF9afmouqccrFqb8Jc="; pin-sha256="dWkVtg0EuckExnceVFvu3tuEApEygbxr2FPTlpHAUrQ="; pin-sha256="DjjVxb2/6kxfX8qyP2TE/j8B0tOB60MhTTvJdNsFPaU="; max-age=5184000; includeSubDomains; report-uri="https://rugkdyndns.report-uri.io/r/default/hpkp/enforce"

Bug: 
```

## python
```
$ ./downloadtest.py https://rugk.dedyn.io/pythontest/works
Accessing https://rugk.dedyn.io/pythontest/works...
RAW:
Bug: 


Decoded:
Bug:
```

You can also test it with HTTP URLs and get the same result.

In usual browsers every request works...

I cannot guarantee that the test server will stay available...
msg268215 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-11 15:04
HTTP header fields are not supposed to have line breaks unless followed by a space or tab. So the server is actually providing a faulty response.

However Python could do better at handling this case. There is already a bug open for this: Issue 24363.

For the record, the full server response I get is:

'HTTP/1.1 200 OK\r\n'
'Server: nginx\r\n'
'Date: Sat, 11 Jun 2016 14:47:19 GMT\r\n'
'Content-Type: text/plain\r\n'
'Transfer-Encoding: chunked\r\n'
'Connection: close\r\n'
'Vary: Accept-Encoding\r\n'
'Strict-Transport-Security: max-age=15768000; includeSubDomains; preload\r\n'
'Public-Key-Pins: \n'
'pin-sha256="306cc4Cc2py0x48ZiX2G5vt5OxF9afmouqccrFqb8Jc=";\n'
'pin-sha256="dWkVtg0EuckExnceVFvu3tuEApEygbxr2FPTlpHAUrQ=";\n'
'pin-sha256="DjjVxb2/6kxfX8qyP2TE/j8B0tOB60MhTTvJdNsFPaU=";\n'
'max-age=5184000; includeSubDomains;\n'
'report-uri="https://rugkdyndns.report-uri.io/r/default/hpkp/enforce"\r\n'
'\r\n'
'28\r\n'
'Bug: https://bugs.python.org/issue27296\n'
'\r\n'
'0\r\n'
'\r\n'
msg268237 - (view) Author: (rugk) Date: 2016-06-11 17:55
Yeah, it might not be the standard or best practise to send such headers, but at least all mayor browsers and curl do not complain about this. Mayor browsers even threat this HPKP header as it is supposed.

But instead of showing complex error messages Python could just ignore the malformed header...
History
Date User Action Args
2016-06-11 17:55:03rugksetmessages: + msg268237
2016-06-11 15:04:09martin.pantersetstatus: open -> closed

nosy: + martin.panter
messages: + msg268215

superseder: httplib fails to handle semivalid HTTP headers
resolution: duplicate
2016-06-11 14:04:01rugkcreate