classification
Title: IncompleteRead / BadStatus when parsing http://peakoil.mobi
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: djc, kmoon, pitrou, yaubi
Priority: normal Keywords:

Created on 2009-08-26 13:51 by kmoon, last changed 2011-04-20 16:04 by yaubi. This issue is now closed.

Messages (3)
msg91972 - (view) Author: Evan (kmoon) Date: 2009-08-26 13:51
(I'm brand new to Python.org, apologies in advance if this has been
recorded elsewhere or is not a bug)

I've a simple script which fetching a url using httplib/urllib2 and then
simply searches the HTML for a string. Works on every URL I've tried
apart from the url http://peakoil.mobi where, for some reason, I get the
following:

Traceback (most recent call last):
  File "C:\Python26\lib\threading.py", line 524, in __bootstrap_inner
    self.run()
  File "test.py", line 59, in run
    html = response.read()
  File "C:\Python26\lib\socket.py", line 327, in read
    data = self._sock.recv(rbufsize)
  File "C:\Python26\lib\httplib.py", line 518, in read
    return self._read_chunked(amt)
  File "C:\Python26\lib\httplib.py", line 564, in _read_chunked
    raise IncompleteRead(value)
IncompleteRead: <html><head>

<title></title></head>
<!-- Redirection Services ASH01WRED02 H1 -->
<frameset rows='100%, *' frameborder=no framespacing=0 border=0>
<frame src="http://peakoil.com/modules.php?name=AvantGo" name=mainwindow
framebo
rder=no framespacing=0 marginheight=0 marginwidth=0></frame>
</frameset>
<noframes>
<h2>Your browser does not support frames.  We recommend upgrading your
browser.<
/h2><br><br>
<center>Click <a
href="http://peakoil.com/modules.php?name=AvantGo">here</a> to
enter the site.</center>
</noframes></html>

Exception in thread 1:
Traceback (most recent call last):
  File "C:\Python26\lib\threading.py", line 524, in __bootstrap_inner
    self.run()
  File "test.py", line 51, in run
    response = urllib2.urlopen(req)
  File "C:\Python26\lib\urllib2.py", line 124, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python26\lib\urllib2.py", line 383, in open
    response = self._open(req, data)
  File "C:\Python26\lib\urllib2.py", line 401, in _open
    '_open', req)
  File "C:\Python26\lib\urllib2.py", line 361, in _call_chain
    result = func(*args)
  File "C:\Python26\lib\urllib2.py", line 1130, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "C:\Python26\lib\urllib2.py", line 1103, in do_open
    r = h.getresponse()
  File "C:\Python26\lib\httplib.py", line 951, in getresponse
    response.begin()
  File "C:\Python26\lib\httplib.py", line 391, in begin
    version, status, reason = self._read_status()
  File "C:\Python26\lib\httplib.py", line 355, in _read_status
    raise BadStatusLine(line)
BadStatusLine
msg124059 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-15 19:53
That server simply doesn't respect the HTTP RFC. It fails to send a last "0" line to indicate that the chunked transfer has completed.
msg134161 - (view) Author: Yoann Aubineau (yaubi) * Date: 2011-04-20 16:04
Chunked transfer encoding has been introduced in HTTP/1.1. Sending an HTTP/1.0 request would then force the server to not use this mechanism.

Module httplib sends HTTP/1.1 requests by default but, as far as I know, does not offer any option to downgrade.

My suggestion would be to monkey patch httplib.HTTPConnection prior to using it :

    import httplib
    httplib.HTTPConnection._http_vsn = 10
    httplib.HTTPConnection._http_vsn_str = 'HTTP/1.0'
History
Date User Action Args
2011-04-20 16:04:41yaubisetnosy: + yaubi
messages: + msg134161
2010-12-28 11:39:24georg.brandlsetstatus: pending -> closed
nosy: pitrou, djc, kmoon
2010-12-15 19:53:34pitrousetstatus: open -> pending

nosy: + pitrou
messages: + msg124059

resolution: not a bug
2010-08-03 14:25:06djcsetnosy: + djc
2010-07-11 09:21:22BreamoreBoysetcomponents: + Library (Lib)
versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2009-08-26 13:51:32kmooncreate