classification
Title: httplib fails with HEAD requests to pages with "transfer-encoding: chunked"
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.1, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Arfrever, chkneo, djc, ezio.melotti, mykhal, orsenthil, rcoup
Priority: Keywords: patch

Created on 2009-06-19 13:53 by ezio.melotti, last changed 2010-06-04 17:33 by orsenthil. This issue is now closed.

Files
File name Uploaded Description Edit
6312.diff chkneo, 2009-06-29 17:03 patch for Lib/http/client.py
Messages (11)
msg89521 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-06-19 13:53
Try this code (youtube.com uses "transfer-encoding: chunked"):

import httplib
url = 'www.youtube.com'
conn = httplib.HTTPConnection(url)
conn.request('HEAD', '/') # send an HEAD request
res = conn.getresponse()
print res.getheader('transfer-encoding')

so far it works fine, but when you try:

res.read()

it just hung there, where "there" is:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Programs\Python26\lib\httplib.py", line 517, in read
    return self._read_chunked(amt)
  File "C:\Programs\Python26\lib\httplib.py", line 553, in _read_chunked
    line = self.fp.readline()
  File "C:\Programs\Python26\lib\socket.py", line 395, in readline
    data = recv(1)
KeyboardInterrupt

If instead of youtube.com we replace the url with the one of a site that
doesn't use "transfer-encoding: chunked" (e.g. url = 'dpaste.com'),
res.read() returns an empty string.



When an HEAD request is sent, the content of the page is not returned,
so there should be no point in calling .read(), but try this:

import urllib2

class HeadRequest(urllib2.Request):
    def get_method(self):
        return 'HEAD'

url = 'http://www.youtube.com/watch?v=tCVqx2b-c7U'
# Note: I had this problem with this URL, the video 
# is not available in my country (Finland) and it
# may work fine for other countries

req = HeadRequest(url)
page = urllib2.urlopen(req)


This is what happens here with Python 2.5:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/urllib2.py", line 124, in urlopen
    return _opener.open(url, data)
  File "/usr/lib/python2.5/urllib2.py", line 387, in open
    response = meth(req, response)
  File "/usr/lib/python2.5/urllib2.py", line 498, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.5/urllib2.py", line 419, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.5/urllib2.py", line 360, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.5/urllib2.py", line 579, in http_error_302
    fp.read()
  File "/usr/lib/python2.5/socket.py", line 291, in read
    data = self._sock.recv(recv_size)
  File "/usr/lib/python2.5/httplib.py", line 509, in read
    return self._read_chunked(amt)
  File "/usr/lib/python2.5/httplib.py", line 548, in _read_chunked
    chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: ''


With Python 2.6 the error is slightly different:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Programs\Python26\lib\urllib2.py", line 124, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Programs\Python26\lib\urllib2.py", line 389, in open
    response = meth(req, response)
  File "C:\Programs\Python26\lib\urllib2.py", line 502, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Programs\Python26\lib\urllib2.py", line 421, in error
    result = self._call_chain(*args)
  File "C:\Programs\Python26\lib\urllib2.py", line 361, in _call_chain
    result = func(*args)
  File "C:\Programs\Python26\lib\urllib2.py", line 594, in http_error_302
    fp.read()
  File "C:\Programs\Python26\lib\socket.py", line 327, in read
    data = self._sock.recv(rbufsize)
  File "C:\Programs\Python26\lib\httplib.py", line 517, in read
    return self._read_chunked(amt)
  File "C:\Programs\Python26\lib\httplib.py", line 563, in _read_chunked
    raise IncompleteRead(value)
httplib.IncompleteRead

With Py3.0 it is the same:
[...]
http.client.IncompleteRead: b''


In this case self.fp.readline() (and the data = recv(1) in socket.py)
returns and the error happens a few lines later.
This seems to happen when there's a redirection in between (the video is
not available in my country, the server sends back a 303 status code,
and redirects me to the home page). The redirection is not handled by
httplib so there might be something wrong in urllib2 too (why it's
trying to read the content if we sent and HEAD request and if there is a
redirection in between?), but fixing httplib to return an empty string
or something similar could be enough to solve this problem too. If
there's actually a problem another issue should probably be created.

With the same code and the url of a working youtube video (no
redirections in between), "page = urllib2.urlopen(req)" works even if
there's the "transfer-encoding: chunked" but it fails later if we do
"page.read()": 

Traceback (most recent call last):
  File "C:\Programs\Python30\lib\http\client.py", line 520, in _read_chunked
    chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Programs\Python30\lib\http\client.py", line 479, in read
    return self._read_chunked(amt)
  File "C:\Programs\Python30\lib\http\client.py", line 525, in _read_chunked
    raise IncompleteRead(value)
http.client.IncompleteRead: b''
msg89868 - (view) Author: Chandru (chkneo) Date: 2009-06-29 17:03
HEAD request wont return any data. So before calling _read_chunked we
have to check the amt is none or not.If its none simply return b''

I've attached the patch too which is take in py3k branch
msg99796 - (view) Author: Michal Božoň (mykhal) Date: 2010-02-22 17:52
i confirm..

in my case, the bug manifestated when calling HEAD method on a different server with chunked transfer encoding (http://obrazky.cz)

my workaround is to call response.read() always, except from cases when method == 'HEAD' and resp.getheader('transfer-encoding') == 'chunked
msg104404 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-04-28 03:38
I can take this up. The HEAD requests does not contain any data, so when the data is None and transfer encoding is chunked, we can return empty value for the next step. No need of attempting to read the chuncked amt. The patch is fine and tests need to be added.
msg104443 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-04-28 17:48
Whenever the HEAD method is queried, the httplib recognizes it read method and returns an '' empty string as expected.

Fixed in revision 80583, release26-maint: r80584, py3k: r80587 and release31-maint in 80588.
msg106457 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-05-25 18:07
Thanks Senthil!
msg106520 - (view) Author: Dirkjan Ochtman (djc) * (Python committer) Date: 2010-05-26 10:40
The fix in r80583 is bad. It fails to close() the response (which previously worked as expected), meaning that the connection can't be re-used.

(I ran into this because Gentoo has backported the 2.6-maint fixes to their 2.6.5 distribution.)

Shall I open a new issue, or re-open this one?
msg106521 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-05-26 11:12
I am just reopening this, as per dcj's comment.
msg107076 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-06-04 16:46
Fixed in r81687, r81688, r81689 and r81690.

Yes, I see that before the original change was made any chuncked encoding went through _read_chunked which close the resp before returning. So, here for HEAD, the resp is closed thus fixing the problem mentioned by djc.
msg107077 - (view) Author: Dirkjan Ochtman (djc) * (Python committer) Date: 2010-06-04 17:06
Might be useful to have a test for this?
msg107080 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-06-04 17:33
I saw the earlier tests was closing it explicitly. Removed that and added a test which verifies the closed resp obj. Thanks.
History
Date User Action Args
2010-06-04 17:33:10orsenthilsetmessages: + msg107080
2010-06-04 17:06:46djcsetmessages: + msg107077
2010-06-04 16:46:07orsenthilsetstatus: open -> closed
priority: release blocker ->
resolution: accepted -> fixed
messages: + msg107076
2010-06-04 14:42:02djcsetpriority: normal -> release blocker
2010-05-26 13:58:59Arfreversetnosy: + Arfrever
2010-05-26 11:12:47orsenthilsetstatus: closed -> open
resolution: fixed -> accepted
messages: + msg106521
2010-05-26 10:40:36djcsetnosy: + djc
messages: + msg106520
2010-05-25 18:07:35ezio.melottisetmessages: + msg106457
versions: + Python 3.1, Python 3.2, - Python 2.5, Python 3.0
2010-04-28 17:48:21orsenthilsetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg104443

stage: patch review -> resolved
2010-04-28 03:38:07orsenthilsetnosy: + orsenthil
messages: + msg104404

assignee: orsenthil
resolution: accepted
2010-04-28 03:23:46rcoupsetnosy: + rcoup
2010-02-22 17:52:44mykhalsetnosy: + mykhal
messages: + msg99796
2009-06-30 23:06:00ezio.melottisetstage: patch review
2009-06-29 17:03:26chkneosetfiles: + 6312.diff

nosy: + chkneo
messages: + msg89868

keywords: + patch
2009-06-19 13:53:28ezio.melotticreate