classification
Title: Cannot override 'connection: close' in urllib2 headers
Type: behavior Stage:
Components: IO Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: open Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: jcea, martin.panter, orsenthil, r.david.murray, s7v7nislands@gmail.com, sanxiago, shubhojeet.ghosh
Priority: normal Keywords:

Created on 2011-08-28 19:19 by shubhojeet.ghosh, last changed 2015-09-09 05:07 by martin.panter.

Messages (6)
msg143120 - (view) Author: Shubhojeet Ghosh (shubhojeet.ghosh) Date: 2011-08-28 19:19
There seems to be an issue with urllib2
The headers defined does not match with the physical data packet (from wireshark). Other header parameters such as User Agent, cookie works fine.
Here is an example of a failure:

Python Code:
import urllib2

url = "http://www.python.org"

req = urllib2.Request(url)
req.add_header('Connection',"keep-alive")
u = urllib2.urlopen(req)


Wireshark:
GET / HTTP/1.1

Accept-Encoding: identity

Connection: close

Host: www.python.org

User-Agent: Python-urllib/2.6
msg170476 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-09-14 13:28
I've closed issue 15943 as a duplicate of this one.  As I said there, I'm not sure that we (can?) support keep-alive in urllib, though we do in httplib (which is the http package in python3).
msg211387 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-02-17 02:32
I suggest using setdefault() in urllib.request.AbstractHTTPHandler.do_open():

    headers.setdefault("Connection", "close")

I am trying to work around a server that truncates its response when this header is sent, and this change would allow me to specify headers={"Connection", "Keep-Alive"} to get the same effect as dropping the Connection header. This is also consistent with the way the other headers (Accept-Encoding, User-Agent, Host) may be overridden.
msg221006 - (view) Author: Demian Brecht (demian.brecht) * (Python triager) Date: 2014-06-19 16:33
The problem here as far as I can tell is that the underlying file object (addinfourl) blocks while waiting for a full response from the server. As detailed in section 8.1 of RFC 2616, requests and responses can be pipelined, meaning requests can be sent while waiting for full responses from a server.

The suggested change of overriding headers is only a partial solution as it doesn't allow for non-blocking pipelining.

@Martin Panter: My suggestion for you would simply be to use http.client (httplib) as R. David Murray suggests, which doesn't auto-inject the Connection header. Also, a server truncating responses when "Connection: close" is sent sounds like a server-side bug to me. Unless you're a server maintainer (or have access to the developers), have you tried reaching out to them to request a fix?
msg243879 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-23 02:04
So far the only reasons that have been given to override this header (mine and the one in Issue 15943) seem to be to work around buggy servers. It is already documented that HTTP 1.1 and “Connection: close” are used, so if this issue is only about working around buggy servers, the best thing might be to close this as being “not a Python bug”. The user can always still use the low-level HTTP client, or make a custom urllib.request handler class (which is what I did).

Shubhojeet: What was the reason you wanted to set a keep-alive header?

If this is about proper keep-alive (a.k.a persistent) connection support in urllib.request, perhaps have a look at Issue 9740.
msg250285 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-09-09 05:07
Just closed Issue 25037 about a server that omits the chunk length headers when “Connection: closed” is used.

I wonder if it would be such a bad idea to just remove the “Connection: closed” flag. It was added in 2004 in revision 5e7455fb8db6, but I do not agree with the reason given in the commit message and comment. Adding the flag is only really a courtesy to the server, saying it can drop the connection once it sends the response. Removing it in theory shouldn’t change anything about how the client parses the HTTP response, but in practice it seems it may improve compatibility with buggy servers.
History
Date User Action Args
2018-08-14 05:42:56martin.panterlinkissue34357 superseder
2015-09-09 05:07:21martin.pantersetstatus: pending -> open

messages: + msg250285
2015-09-09 04:30:37martin.panterlinkissue25037 superseder
2015-05-23 02:04:27martin.pantersetstatus: open -> pending
resolution: not a bug
messages: + msg243879
2015-04-02 03:07:33s7v7nislands@gmail.comsetnosy: + s7v7nislands@gmail.com
2015-02-13 01:23:31demian.brechtsetnosy: - demian.brecht
2014-06-19 16:33:29demian.brechtsetnosy: + demian.brecht
messages: + msg221006
2014-02-17 02:32:28martin.pantersetmessages: + msg211387
2014-02-17 00:55:59martin.pantersetnosy: + martin.panter
2012-09-17 23:27:58jceasetnosy: + jcea
2012-09-14 13:29:47r.david.murraysettitle: urllib2 headers issue -> Cannot override 'connection: close' in urllib2 headers
2012-09-14 13:28:57r.david.murraysetnosy: + sanxiago, r.david.murray

messages: + msg170476
versions: + Python 2.7, Python 3.2, Python 3.3, - Python 2.6
2012-09-14 13:26:20r.david.murraylinkissue15943 superseder
2011-08-28 19:19:27shubhojeet.ghoshcreate