Message 89521 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	ezio.melotti
Date	2009-06-19.13:53:25
SpamBayes Score	2.164935e-15
Marked as misclassified	No
Message-id	<1245419610.47.0.800823464354.issue6312@psf.upfronthosting.co.za>
In-reply-to

Content
Try this code (youtube.com uses "transfer-encoding: chunked"): import httplib url = 'www.youtube.com' conn = httplib.HTTPConnection(url) conn.request('HEAD', '/') # send an HEAD request res = conn.getresponse() print res.getheader('transfer-encoding') so far it works fine, but when you try: res.read() it just hung there, where "there" is: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Programs\Python26\lib\httplib.py", line 517, in read return self._read_chunked(amt) File "C:\Programs\Python26\lib\httplib.py", line 553, in _read_chunked line = self.fp.readline() File "C:\Programs\Python26\lib\socket.py", line 395, in readline data = recv(1) KeyboardInterrupt If instead of youtube.com we replace the url with the one of a site that doesn't use "transfer-encoding: chunked" (e.g. url = 'dpaste.com'), res.read() returns an empty string. When an HEAD request is sent, the content of the page is not returned, so there should be no point in calling .read(), but try this: import urllib2 class HeadRequest(urllib2.Request): def get_method(self): return 'HEAD' url = 'http://www.youtube.com/watch?v=tCVqx2b-c7U' # Note: I had this problem with this URL, the video # is not available in my country (Finland) and it # may work fine for other countries req = HeadRequest(url) page = urllib2.urlopen(req) This is what happens here with Python 2.5: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.5/urllib2.py", line 124, in urlopen return _opener.open(url, data) File "/usr/lib/python2.5/urllib2.py", line 387, in open response = meth(req, response) File "/usr/lib/python2.5/urllib2.py", line 498, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python2.5/urllib2.py", line 419, in error result = self._call_chain(args) File "/usr/lib/python2.5/urllib2.py", line 360, in _call_chain result = func(args) File "/usr/lib/python2.5/urllib2.py", line 579, in http_error_302 fp.read() File "/usr/lib/python2.5/socket.py", line 291, in read data = self._sock.recv(recv_size) File "/usr/lib/python2.5/httplib.py", line 509, in read return self._read_chunked(amt) File "/usr/lib/python2.5/httplib.py", line 548, in _read_chunked chunk_left = int(line, 16) ValueError: invalid literal for int() with base 16: '' With Python 2.6 the error is slightly different: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Programs\Python26\lib\urllib2.py", line 124, in urlopen return _opener.open(url, data, timeout) File "C:\Programs\Python26\lib\urllib2.py", line 389, in open response = meth(req, response) File "C:\Programs\Python26\lib\urllib2.py", line 502, in http_response 'http', request, response, code, msg, hdrs) File "C:\Programs\Python26\lib\urllib2.py", line 421, in error result = self._call_chain(args) File "C:\Programs\Python26\lib\urllib2.py", line 361, in _call_chain result = func(args) File "C:\Programs\Python26\lib\urllib2.py", line 594, in http_error_302 fp.read() File "C:\Programs\Python26\lib\socket.py", line 327, in read data = self._sock.recv(rbufsize) File "C:\Programs\Python26\lib\httplib.py", line 517, in read return self._read_chunked(amt) File "C:\Programs\Python26\lib\httplib.py", line 563, in _read_chunked raise IncompleteRead(value) httplib.IncompleteRead With Py3.0 it is the same: [...] http.client.IncompleteRead: b'' In this case self.fp.readline() (and the data = recv(1) in socket.py) returns and the error happens a few lines later. This seems to happen when there's a redirection in between (the video is not available in my country, the server sends back a 303 status code, and redirects me to the home page). The redirection is not handled by httplib so there might be something wrong in urllib2 too (why it's trying to read the content if we sent and HEAD request and if there is a redirection in between?), but fixing httplib to return an empty string or something similar could be enough to solve this problem too. If there's actually a problem another issue should probably be created. With the same code and the url of a working youtube video (no redirections in between), "page = urllib2.urlopen(req)" works even if there's the "transfer-encoding: chunked" but it fails later if we do "page.read()": Traceback (most recent call last): File "C:\Programs\Python30\lib\http\client.py", line 520, in _read_chunked chunk_left = int(line, 16) ValueError: invalid literal for int() with base 16: '' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Programs\Python30\lib\http\client.py", line 479, in read return self._read_chunked(amt) File "C:\Programs\Python30\lib\http\client.py", line 525, in _read_chunked raise IncompleteRead(value) http.client.IncompleteRead: b''

Try this code (youtube.com uses "transfer-encoding: chunked"):

import httplib
url = 'www.youtube.com'
conn = httplib.HTTPConnection(url)
conn.request('HEAD', '/') # send an HEAD request
res = conn.getresponse()
print res.getheader('transfer-encoding')

so far it works fine, but when you try:

res.read()

it just hung there, where "there" is:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Programs\Python26\lib\httplib.py", line 517, in read
    return self._read_chunked(amt)
  File "C:\Programs\Python26\lib\httplib.py", line 553, in _read_chunked
    line = self.fp.readline()
  File "C:\Programs\Python26\lib\socket.py", line 395, in readline
    data = recv(1)
KeyboardInterrupt

If instead of youtube.com we replace the url with the one of a site that
doesn't use "transfer-encoding: chunked" (e.g. url = 'dpaste.com'),
res.read() returns an empty string.



When an HEAD request is sent, the content of the page is not returned,
so there should be no point in calling .read(), but try this:

import urllib2

class HeadRequest(urllib2.Request):
    def get_method(self):
        return 'HEAD'

url = 'http://www.youtube.com/watch?v=tCVqx2b-c7U'
# Note: I had this problem with this URL, the video 
# is not available in my country (Finland) and it
# may work fine for other countries

req = HeadRequest(url)
page = urllib2.urlopen(req)


This is what happens here with Python 2.5:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/urllib2.py", line 124, in urlopen
    return _opener.open(url, data)
  File "/usr/lib/python2.5/urllib2.py", line 387, in open
    response = meth(req, response)
  File "/usr/lib/python2.5/urllib2.py", line 498, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.5/urllib2.py", line 419, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.5/urllib2.py", line 360, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.5/urllib2.py", line 579, in http_error_302
    fp.read()
  File "/usr/lib/python2.5/socket.py", line 291, in read
    data = self._sock.recv(recv_size)
  File "/usr/lib/python2.5/httplib.py", line 509, in read
    return self._read_chunked(amt)
  File "/usr/lib/python2.5/httplib.py", line 548, in _read_chunked
    chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: ''


With Python 2.6 the error is slightly different:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Programs\Python26\lib\urllib2.py", line 124, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Programs\Python26\lib\urllib2.py", line 389, in open
    response = meth(req, response)
  File "C:\Programs\Python26\lib\urllib2.py", line 502, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Programs\Python26\lib\urllib2.py", line 421, in error
    result = self._call_chain(*args)
  File "C:\Programs\Python26\lib\urllib2.py", line 361, in _call_chain
    result = func(*args)
  File "C:\Programs\Python26\lib\urllib2.py", line 594, in http_error_302
    fp.read()
  File "C:\Programs\Python26\lib\socket.py", line 327, in read
    data = self._sock.recv(rbufsize)
  File "C:\Programs\Python26\lib\httplib.py", line 517, in read
    return self._read_chunked(amt)
  File "C:\Programs\Python26\lib\httplib.py", line 563, in _read_chunked
    raise IncompleteRead(value)
httplib.IncompleteRead

With Py3.0 it is the same:
[...]
http.client.IncompleteRead: b''


In this case self.fp.readline() (and the data = recv(1) in socket.py)
returns and the error happens a few lines later.
This seems to happen when there's a redirection in between (the video is
not available in my country, the server sends back a 303 status code,
and redirects me to the home page). The redirection is not handled by
httplib so there might be something wrong in urllib2 too (why it's
trying to read the content if we sent and HEAD request and if there is a
redirection in between?), but fixing httplib to return an empty string
or something similar could be enough to solve this problem too. If
there's actually a problem another issue should probably be created.

With the same code and the url of a working youtube video (no
redirections in between), "page = urllib2.urlopen(req)" works even if
there's the "transfer-encoding: chunked" but it fails later if we do
"page.read()": 

Traceback (most recent call last):
  File "C:\Programs\Python30\lib\http\client.py", line 520, in _read_chunked
    chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Programs\Python30\lib\http\client.py", line 479, in read
    return self._read_chunked(amt)
  File "C:\Programs\Python30\lib\http\client.py", line 525, in _read_chunked
    raise IncompleteRead(value)
http.client.IncompleteRead: b''

History
Date	User	Action	Args
2009-06-19 13:53:31	ezio.melotti	set	recipients: + ezio.melotti
2009-06-19 13:53:30	ezio.melotti	set	messageid: <1245419610.47.0.800823464354.issue6312@psf.upfronthosting.co.za>
2009-06-19 13:53:28	ezio.melotti	link	issue6312 messages
2009-06-19 13:53:26	ezio.melotti	create