Issue25539
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015-11-03 06:38 by 신동원, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Messages (12) | |||
---|---|---|---|
msg253966 - (view) | Author: Shin Dong-won (신동원) | Date: 2015-11-03 06:38 | |
I tried to login some website using requests session, but it failed because of parsing header. This is header data of problem. It contains two line of continuous CRLF between the data. Date: Mon, 02 Nov 2015 08:45:48 GMT Server: Apache/2.2.20 (Unix) mod_ssl/2.2.20 OpenSSL/0.9.8e-fips-rhel5 DAV/2 PHP/5.2.16 mod_fastcgi/2.4.6 X-Powered-By: PHP/5.2.16 P3P: CP="NOI CURa ADMa DEVa TAIa OUR DELa BUS IND PHY ONL UNI COM NAV INT DEM PRE" P3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC" Set-Cookie: PHPSESSID=**; path=/ Set-Cookie: PHPSESSID=**; path=/ Content-Length: 81 Connection: close Content-Type: text/html function parse_headers in http.client, parsing exits when it meet CRLF. (https://hg.python.org/cpython/file/tip/Lib/http/client.py#l197) Is it standard? Same code works well in python2, or even similar code in ruby. |
|||
msg253971 - (view) | Author: Martin Panter (martin.panter) * | Date: 2015-11-03 08:47 | |
Are you able to print out the repr() of the header or the entire HTTP response so we can see exactly what characters are there? Or provide a URL if it is a public server. I suspect it may not be a completely blank line, but may have whitespace there. Both Python 2 and 3 should stop parsing the HTTP header when they meet a blank line (two CRLFs in a row). This marks the start of the HTTP body. See <https://tools.ietf.org/html/rfc7230#section-3>. |
|||
msg253977 - (view) | Author: Shin Dong-won (신동원) | Date: 2015-11-03 10:22 | |
b'Date: Tue, 03 Nov 2015 10:05:42 GMT\nServer: Apache/2.2.20 (Unix) mod_ssl/2.2.20 OpenSSL/0.9.8e-fips-rhel5 DAV/2\n PHP/5.2.16 mod_fastcgi/2.4.6\nX-Powered-By: PHP/5.2.16\nP3P: CP="NOI CURa ADMa DEVa TAIa OUR DELa BUS IND PHY ONL UNI COM NAV INT DEM\n PRE"\n\nP3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC"\nSet-Cookie: PHPSESSID=**; path=/\nSet-Cookie: PHPSESSID=**; path=/\nContent-Length: 79\nConnection: close\nContent-Type: text/html\n\n' There's no other characters except CRLF. And other weird result, in python2 there are no two CRLF between P3P:~. Here's gist for test. (https://gist.github.com/littmus/9625a4436e1edfb3afe9) |
|||
msg253978 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-11-03 10:33 | |
> It contains two line of continuous CRLF between the data. It looks like a bug at server side. \n\n is the separator between headers and data. An header must not end with \n\n. |
|||
msg253979 - (view) | Author: Shin Dong-won (신동원) | Date: 2015-11-03 10:35 | |
I think so but same http request works well in Python2 |
|||
msg253980 - (view) | Author: Shin Dong-won (신동원) | Date: 2015-11-03 11:00 | |
I had a mistake. That's not CRLF but just two '\n'. |
|||
msg253983 - (view) | Author: Shin Dong-won (신동원) | Date: 2015-11-03 11:26 | |
I monkey patched the method `parse_headers` to print what the fp read and finally arrived at that http.client is not a problem. email.parser.Parser.parsestr adds two '\n' between two lines of "P3P: ~~". end: b'POST /bbs/login_check.php HTTP/1.1\r\nHost: www.koreapas.com\r\nAccept-Encoding: identity\r\nContent-Length: 31\r\nUser-Agent: Mozilla/5.0 (Linux; Android 4.2.2; GT-I9505 Build/JDQ39) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.59 Mobile Safari/537.36\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\nuser_id=blahsdfi&password=qwera' reply: 'HTTP/1.1 200 OK\r\n' b'Date: Tue, 03 Nov 2015 11:23:16 GMT\r\n' b'Server: Apache/2.2.20 (Unix) mod_ssl/2.2.20 OpenSSL/0.9.8e-fips-rhel5 DAV/2 PHP/5.2.16 mod_fastcgi/2.4.6\r\n' b'X-Powered-By: PHP/5.2.16\r\n' b'P3P: CP="NOI CURa ADMa DEVa TAIa OUR DELa BUS IND PHY ONL UNI COM NAV INT DEM PRE"\r\n' b'P3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC"\r\n' b'Set-Cookie: PHPSESSID=e990184309013a2c97579c76a52e203a; path=/\r\n' b'Content-Length: 1504\r\n' b'Connection: close\r\n' b'Content-Type: text/html\r\n' b'\r\n' ########################## hstring Date: Tue, 03 Nov 2015 11:23:16 GMT Server: Apache/2.2.20 (Unix) mod_ssl/2.2.20 OpenSSL/0.9.8e-fips-rhel5 DAV/2 PHP/5.2.16 mod_fastcgi/2.4.6 X-Powered-By: PHP/5.2.16 P3P: CP="NOI CURa ADMa DEVa TAIa OUR DELa BUS IND PHY ONL UNI COM NAV INT DEM PRE" P3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC" Set-Cookie: PHPSESSID=e990184309013a2c97579c76a52e203a; path=/ Content-Length: 1504 Connection: close Content-Type: text/html ###################################### ########################### parsed b'Date: Tue, 03 Nov 2015 11:23:16 GMT\nServer: Apache/2.2.20 (Unix) mod_ssl/2.2.20 OpenSSL/0.9.8e-fips-rhel5 DAV/2\n PHP/5.2.16 mod_fastcgi/2.4.6\nX-Powered-By: PHP/5.2.16\nP3P: CP="NOI CURa ADMa DEVa TAIa OUR DELa BUS IND PHY ONL UNI COM NAV INT DEM\n PRE"\n\nP3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC"\nSet-Cookie: PHPSESSID=e990184309013a2c97579c76a52e203a; path=/\nContent-Length: 1504\nConnection: close\nContent-Type: text/html\n\n' ####################################### |
|||
msg253984 - (view) | Author: Shin Dong-won (신동원) | Date: 2015-11-03 11:54 | |
I found that Python3 no more uses header format from rfc822, and the data b'P3P : CP="ALL CURa ADMa DEVa TAIa OUR BUS IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC OTC"\r\n' not match with `headerRE = re.compile(r'^(From |[\041-\071\073-\176]*:|[\t ])')` so it makes problem. |
|||
msg254037 - (view) | Author: Martin Panter (martin.panter) * | Date: 2015-11-04 02:24 | |
Okay, now I understand the problem. There is a quirky header line with a space in the field name “P3P ”. The HTTP client’s parse_headers() treats this as any other header line, but then it passes the header to the email package, which interprets this line as being invalid and marking the end of the header. Therefore subsequent important header fields are missed, including Set-Cookie and Content-Length. According to <https://tools.ietf.org/html/rfc7230#section-3.2>, that P3P line is technically not valid. Here is a self-contained demo or test case: from socket import socket from threading import Thread from http.client import HTTPConnection def serve(): [client, _] = server.accept() with client, client.makefile("rb") as reader: while reader.readline().rstrip(b"\r\n"): pass client.sendall( b"HTTP/1.1 200 OK\r\n" b"Content-Length: 0\r\n" b"Extra-Space : invalid\r\n" b"Set-Cookie: name=value\r\n" b"\r\n" ) with socket() as server: server.bind(("localhost", 0)) server.listen() background = Thread(target=serve) background.start() http = HTTPConnection(*server.getsockname()) http.request("GET", "/") response = http.getresponse() print(response.msg.items()) # Set-Cookie is missing http.close() background.join() The question is, should Python go out of its way to handle this server bug? It would probably require implementing a more permissive version of the header parser in the HTTP client, rather than reusing the stricter “email” module’s parser. |
|||
msg254048 - (view) | Author: Martin Panter (martin.panter) * | Date: 2015-11-04 10:48 | |
Just noticed the whitespace scenario is mentioned at <https://tools.ietf.org/html/rfc7230#section-3.2.4>: ''' No whitespace is allowed between the header field-name and colon. In the past, differences in the handling of such whitespace have led to security vulnerabilities in . . . response handling. . . . A proxy must remove any such whitespace from a response message before forwarding the message downstream. ''' It would not be possible build a proxy that does that using Python 3’s current HTTP client. |
|||
msg254049 - (view) | Author: Shin Dong-won (신동원) | Date: 2015-11-04 10:54 | |
Yeah, this is server's fault and python does not have to deal with non-standard cases. I'll close this issue. Thanks! |
|||
msg254056 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2015-11-04 15:19 | |
Support for handling such headers could be added to the new email API (ie: add a policy setting to accept them), if someone wants to make a feature request. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:23 | admin | set | github: 69725 |
2015-11-04 15:19:30 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg254056 |
2015-11-04 10:54:23 | 신동원 | set | status: open -> closed messages: + msg254049 |
2015-11-04 10:48:01 | martin.panter | set | messages: + msg254048 |
2015-11-04 02:24:16 | martin.panter | set | messages:
+ msg254037 versions: + Python 3.6 |
2015-11-03 11:54:33 | 신동원 | set | messages: + msg253984 |
2015-11-03 11:26:12 | 신동원 | set | messages: + msg253983 |
2015-11-03 11:00:29 | 신동원 | set | messages: + msg253980 |
2015-11-03 10:35:20 | 신동원 | set | messages: + msg253979 |
2015-11-03 10:33:37 | vstinner | set | nosy:
+ vstinner messages: + msg253978 |
2015-11-03 10:22:10 | 신동원 | set | messages: + msg253977 |
2015-11-03 08:47:40 | martin.panter | set | type: crash -> behavior |
2015-11-03 08:47:15 | martin.panter | set | nosy:
+ martin.panter messages: + msg253971 |
2015-11-03 06:38:03 | 신동원 | create |