Title: .read(0) on http.client.HTTPResponse drops the rest of the content
msg206446 - (view) Author: Simon Sapin (ssapin) Date: 2013-12-17 14:29
When given a file-like object, html5lib calls .read(0) in order to check if the result is bytes or Unicode:

When given the result of urllib.client.urlopen(), it parses an empty document because of this bug.

Test case:

>>> from urllib.request import urlopen
>>> response = urlopen('')
>>> len(

For comparison:

>>> response = urlopen('')
>>> len(

The bug is here:

'if not n:' assumes that "zero bytes have been read" indicates EOF, which is not the case when we ask for zero bytes.
msg206448 - (view) Author: Simon Sapin (ssapin) Date: 2013-12-17 14:36
Adding a proposed patch.
msg206458 - (view) Author: Simon Sapin (ssapin) Date: 2013-12-17 15:04
html5lib issue:
msg206459 - (view) Author: Simon Sapin (ssapin) Date: 2013-12-17 15:06
I could reproduce on 3.3.3 and tip, but not 3.2.3 or 2.7.6.
msg206477 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-12-17 19:54
New changeset ebace0a5a33e by Serhiy Storchaka in branch '2.7':
Issue #20007: no more prematurely closes connection.

New changeset 47ae858cd661 by Serhiy Storchaka in branch '3.3':
Issue #20007: no more prematurely closes connection.

New changeset d032245a122c by Serhiy Storchaka in branch 'default':
Issue #20007: no more prematurely closes connection.
msg206479 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-17 20:05
2.7 is affected too.

Thank you Simon Sapin for your contribution.
