Author maubp
Recipients maubp
Date 2016-03-07.10:10:52
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1457345453.59.0.0327827539319.issue26499@psf.upfronthosting.co.za>
In-reply-to
Content
This is a regression in Python 3.5 tested under Linux and Mac OS X, spotted from a failing test in Biopython https://github.com/biopython/biopython/issues/773 where we would parse a file from the internet. The trigger is partially reading the network handle line by line (e.g. until an end record marker is found), and then calling handle.read() to fetch any remaining data. Self contained examples below.

Note that partially reading a file like this still works:


$ python3.5
Python 3.5.0 (default, Sep 14 2015, 12:13:24) 
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> from urllib.request import urlopen
>>> handle = urlopen("http://www.python.org")
>>> chunk = handle.read(50)
>>> rest = handle.read()
>>> handle.close()


However, the following variants read a few lines and then attempt to call handle.read() and fail. The URL is not important (as long as it has over four lines in these examples).

Using readline,


>>> from urllib.request import urlopen
>>> handle = urlopen("http://www.python.org")
>>> for i in range(4):
...     line = handle.readline()
... 
>>> rest = handle.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/xxx/lib/python3.5/http/client.py", line 446, in read
    s = self._safe_read(self.length)
  File "/Users/xxx/lib/python3.5/http/client.py", line 594, in _safe_read
    raise IncompleteRead(b''.join(s), amt)
http.client.IncompleteRead: IncompleteRead(46698 bytes read, 259 more expected)


Using line iteration via next,


>>> from urllib.request import urlopen
>>> handle = urlopen("http://www.python.org")
>>> for i in range(4):
...      line = next(handle)
... 
>>> rest = handle.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/xxx/lib/python3.5/http/client.py", line 446, in read
    s = self._safe_read(self.length)
  File "/Users/xxx/lib/python3.5/http/client.py", line 594, in _safe_read
    raise IncompleteRead(b''.join(s), amt)
http.client.IncompleteRead: IncompleteRead(46698 bytes read, 259 more expected)


Using line iteration directly,


>>> from urllib.request import urlopen
>>> count = 0
>>> handle = urlopen("http://www.python.org")
>>> for line in handle:
...     count += 1
...     if count == 4:
...         break
... 
>>> rest = handle.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/xxx/lib/python3.5/http/client.py", line 446, in read
    s = self._safe_read(self.length)
  File "/Users/xxx/lib/python3.5/http/client.py", line 594, in _safe_read
    raise IncompleteRead(b''.join(s), amt)
http.client.IncompleteRead: IncompleteRead(46698 bytes read, 259 more expected)



These examples all worked on Python 3.3 and 3.4 so this is a regression.
History
Date User Action Args
2016-03-07 10:10:53maubpsetrecipients: + maubp
2016-03-07 10:10:53maubpsetmessageid: <1457345453.59.0.0327827539319.issue26499@psf.upfronthosting.co.za>
2016-03-07 10:10:53maubplinkissue26499 messages
2016-03-07 10:10:52maubpcreate