Message 206446 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ssapin
Recipients	ssapin
Date	2013-12-17.14:29:40
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1387290581.03.0.808557642929.issue20007@psf.upfronthosting.co.za>
In-reply-to

Content
When given a file-like object, html5lib calls .read(0) in order to check if the result is bytes or Unicode: https://github.com/html5lib/html5lib-python/blob/e269a2fd0aafcd83af7cf1e65bba65c0e5a2c18b/html5lib/inputstream.py#L434 When given the result of urllib.client.urlopen(), it parses an empty document because of this bug. Test case: >>> from urllib.request import urlopen >>> response = urlopen('http://python.org') >>> response.read(0) b'' >>> len(response.read()) 0 For comparison: >>> response = urlopen('http://python.org') >>> len(response.read()) 20317 The bug is here: http://hg.python.org/cpython/file/d489394a73de/Lib/http/client.py#l541 'if not n:' assumes that "zero bytes have been read" indicates EOF, which is not the case when we ask for zero bytes.

When given a file-like object, html5lib calls .read(0) in order to check if the result is bytes or Unicode:

https://github.com/html5lib/html5lib-python/blob/e269a2fd0aafcd83af7cf1e65bba65c0e5a2c18b/html5lib/inputstream.py#L434

When given the result of urllib.client.urlopen(), it parses an empty document because of this bug.

Test case:

>>> from urllib.request import urlopen
>>> response = urlopen('http://python.org')
>>> response.read(0)
b''
>>> len(response.read())
0

For comparison:

>>> response = urlopen('http://python.org')
>>> len(response.read())
20317

The bug is here:

http://hg.python.org/cpython/file/d489394a73de/Lib/http/client.py#l541

'if not n:' assumes that "zero bytes have been read" indicates EOF, which is not the case when we ask for zero bytes.

History
Date	User	Action	Args
2013-12-17 14:29:41	ssapin	set	recipients: + ssapin
2013-12-17 14:29:41	ssapin	set	messageid: <1387290581.03.0.808557642929.issue20007@psf.upfronthosting.co.za>
2013-12-17 14:29:40	ssapin	link	issue20007 messages
2013-12-17 14:29:40	ssapin	create