classification
Title: .read(0) on http.client.HTTPResponse drops the rest of the content
Type: behavior Stage: committed/rejected
Components: Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: orsenthil, python-dev, serhiy.storchaka, ssapin
Priority: normal Keywords: patch

Created on 2013-12-17 14:29 by ssapin, last changed 2013-12-17 20:05 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
python-issue20007.diff ssapin, 2013-12-17 14:36 Patch fixing the issue and adding a test review
Messages (6)
msg206446 - (view) Author: Simon Sapin (ssapin) Date: 2013-12-17 14:29
When given a file-like object, html5lib calls .read(0) in order to check if the result is bytes or Unicode:

https://github.com/html5lib/html5lib-python/blob/e269a2fd0aafcd83af7cf1e65bba65c0e5a2c18b/html5lib/inputstream.py#L434

When given the result of urllib.client.urlopen(), it parses an empty document because of this bug.

Test case:

>>> from urllib.request import urlopen
>>> response = urlopen('http://python.org')
>>> response.read(0)
b''
>>> len(response.read())
0

For comparison:

>>> response = urlopen('http://python.org')
>>> len(response.read())
20317

The bug is here:

http://hg.python.org/cpython/file/d489394a73de/Lib/http/client.py#l541

'if not n:' assumes that "zero bytes have been read" indicates EOF, which is not the case when we ask for zero bytes.
msg206448 - (view) Author: Simon Sapin (ssapin) Date: 2013-12-17 14:36
Adding a proposed patch.
msg206458 - (view) Author: Simon Sapin (ssapin) Date: 2013-12-17 15:04
html5lib issue: https://github.com/html5lib/html5lib-python/issues/127
msg206459 - (view) Author: Simon Sapin (ssapin) Date: 2013-12-17 15:06
I could reproduce on 3.3.3 and tip, but not 3.2.3 or 2.7.6.
msg206477 - (view) Author: Roundup Robot (python-dev) Date: 2013-12-17 19:54
New changeset ebace0a5a33e by Serhiy Storchaka in branch '2.7':
Issue #20007: HTTPResponse.read(0) no more prematurely closes connection.
http://hg.python.org/cpython/rev/ebace0a5a33e

New changeset 47ae858cd661 by Serhiy Storchaka in branch '3.3':
Issue #20007: HTTPResponse.read(0) no more prematurely closes connection.
http://hg.python.org/cpython/rev/47ae858cd661

New changeset d032245a122c by Serhiy Storchaka in branch 'default':
Issue #20007: HTTPResponse.read(0) no more prematurely closes connection.
http://hg.python.org/cpython/rev/d032245a122c
msg206479 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-17 20:05
2.7 is affected too.

Thank you Simon Sapin for your contribution.
History
Date User Action Args
2013-12-17 20:05:17serhiy.storchakasetstatus: open -> closed
versions: + Python 2.7
messages: + msg206479

resolution: fixed
stage: patch review -> committed/rejected
2013-12-17 19:54:14python-devsetnosy: + python-dev
messages: + msg206477
2013-12-17 17:52:47serhiy.storchakasetassignee: serhiy.storchaka
2013-12-17 17:52:05serhiy.storchakasetversions: - Python 3.5
2013-12-17 15:06:49ssapinsetmessages: + msg206459
versions: + Python 3.5, - Python 2.7
2013-12-17 15:04:04ssapinsetmessages: + msg206458
2013-12-17 14:59:10serhiy.storchakasetnosy: + orsenthil, serhiy.storchaka
stage: patch review

versions: + Python 2.7, Python 3.3, Python 3.4
2013-12-17 14:36:46ssapinsetfiles: + python-issue20007.diff
keywords: + patch
messages: + msg206448
2013-12-17 14:29:41ssapincreate