classification
Title: io.TextIOWrapper on urllib.request.urlopen terminates prematurely
Type: behavior Stage: resolved
Components: IO, Library (Lib) Versions: Python 3.2, Python 3.3, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: amaury.forgeotdarc, bow, dabeaz, mdehoon, orsenthil, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012-12-19 06:42 by mdehoon, last changed 2013-02-06 08:42 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
httpresponse_noclosed.patch serhiy.storchaka, 2013-01-26 20:08 review
Messages (10)
msg177726 - (view) Author: Michiel de Hoon (mdehoon) * Date: 2012-12-19 06:42
I am trying to use io.TextIOWrapper to wrap a handle returned by urllib.request.urlopen. Reading line-by-line from the wrapped handle terminates prematurely.

As an example, consider this script:

import urllib.request
import io

url = "http://www.python.org"
handle = urllib.request.urlopen(url)
wrapped_handle = io.TextIOWrapper(handle, encoding='utf-8')
for line in wrapped_handle:
    pass

This gives:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file.

This happens after 335 out of the 430 lines have been read (the last line read is "<p>The <a class="reference external" href="/psf/">Python Software Foundation</a> holds the intellectual property\n", which is line 335 on the www.python.org website.
msg177729 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012-12-19 09:55
Hum, io objects are not supposed to close themselves when they run out of data.
Even if HTTPResponse chooses to close the underlying socket (to clean unused resources?), it should not report itself as a closed io.IOBase.
Subsequent calls read() should return b"", this is the io.RawIOBase way to indicate EOF.

To fix this particular example, it seems enough to delete the "@property def closed(self)" from HTTPResponse. Note the XXX just above:
"""
    # XXX This class should probably be revised to act more like
    # the "raw stream" that BufferedReader expects.
"""
But close() should be modified as well, and internal calls to close() should be changed to only close the underlying socket.
msg177730 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-19 09:59
This looks as a known bug in io.TextIOWrapper which call read() even previous read() returned an empty data. There was a related issue, I can't found it now.
msg177731 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012-12-19 10:21
buffer.read() never returns empty data in this case.
msg177744 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-19 14:32
Indeed.
msg180050 - (view) Author: David Beazley (dabeaz) Date: 2013-01-15 20:54
I have run into this bug myself.  Agree that a file-like object should never report itself as closed unless .close() has been explicitly called on it.   HTTPResponse should not return itself as closed after the end-of-file has been reached.

I think there is also a bug in the implementation of TextIOWrapper as well.  Even if the underlying file reports itself as closed, previously read and buffered data should be processed first before reporting an error about the file being closed.
msg180706 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-26 20:08
Here is a patch which fixes HTTPResponse's end. "closed" property no longer settled automatically, but only after explicit close().
msg181438 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-05 14:12
Senthil, Antoine, anyone, what you think about this patch?
msg181474 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-05 19:52
This looks ok to me. I am slightly surprised that isclosed() isn't documented anywhere (but perhaps it's better).
msg181502 - (view) Author: Roundup Robot (python-dev) Date: 2013-02-06 08:38
New changeset 6cc5bbfcf04e by Serhiy Storchaka in branch '3.2':
Issue #16723: httplib.HTTPResponse no longer marked closed when the connection
http://hg.python.org/cpython/rev/6cc5bbfcf04e

New changeset 0461ed77ee4e by Serhiy Storchaka in branch '3.3':
Issue #16723: httplib.HTTPResponse no longer marked closed when the connection
http://hg.python.org/cpython/rev/0461ed77ee4e

New changeset 5f8c68281d18 by Serhiy Storchaka in branch 'default':
Issue #16723: httplib.HTTPResponse no longer marked closed when the connection
http://hg.python.org/cpython/rev/5f8c68281d18
History
Date User Action Args
2013-02-06 08:42:36serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2013-02-06 08:38:53python-devsetnosy: + python-dev
messages: + msg181502
2013-02-05 19:52:12pitrousetmessages: + msg181474
2013-02-05 14:12:57serhiy.storchakasetmessages: + msg181438
2013-01-31 14:38:36serhiy.storchakasetassignee: serhiy.storchaka
2013-01-26 20:08:41serhiy.storchakasetfiles: + httpresponse_noclosed.patch

components: + Library (Lib), IO
versions: + Python 3.2, Python 3.4
keywords: + patch
nosy: + orsenthil

messages: + msg180706
stage: patch review
2013-01-15 20:54:15dabeazsetnosy: + dabeaz
messages: + msg180050
2012-12-19 14:32:02serhiy.storchakasetmessages: + msg177744
2012-12-19 13:37:41bowsetnosy: + bow
2012-12-19 10:21:49amaury.forgeotdarcsetmessages: + msg177731
2012-12-19 09:59:27serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg177730
2012-12-19 09:55:14amaury.forgeotdarcsetnosy: + amaury.forgeotdarc, pitrou
messages: + msg177729
2012-12-19 06:42:14mdehooncreate