classification
Title: FileIO.readall() read the file position and size at each read
Type: Stage:
Components: IO Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, pitrou, python-dev
Priority: normal Keywords: patch

Created on 2011-05-25 12:13 by haypo, last changed 2011-05-30 00:30 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
fileio_readall.patch haypo, 2011-05-25 12:13 review
bufferedreader_readall-2.patch haypo, 2011-05-25 21:44 review
Messages (17)
msg136840 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-25 12:13
FileIO.readall() reads the file position and size before each call to read(), to adjust the buffer size.

Moreover FileIO.readall() calls lseek() on Windows: it should use _lseeki64() instead, to handle correctly file bigger than 2 GB (or maybe 4 GB? I don't know).

Attached patch fixes both problems.

--

BufferedReader.read() calls FileIO.read() until FileIO.read() returns an empty byte string. Why not calling FileIO.read() only once?
msg136842 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-25 12:19
> BufferedReader.read() calls FileIO.read() until FileIO.read() returns
> an empty byte string. Why not calling FileIO.read() only once?

BufferedReader doesn't call FileIO.read, it calls <rawio>.read().
The latter can be e.g. a socket and call recv(). If you want to read
till the end of stream, you have to repeat until recv() returns the
empty string.
msg136843 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-25 12:21
> FileIO.readall() reads the file position and size before each call to
> read(), to adjust the buffer size.
> 
> Moreover FileIO.readall() calls lseek() on Windows: it should use
> _lseeki64() instead, to handle correctly file bigger than 2 GB (or
> maybe 4 GB? I don't know).
> 
> Attached patch fixes both problems.

Looks ok to me. Did you test under Windows? Did you run some benchmarks?
msg136845 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-25 14:09
Oh, FileIO.readall() doesn't raise a ValueError if the file is closed => patch attached to fix this.
msg136846 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-25 14:10
+            raw = f
+            if hasattr(raw, 'buffer'):
+                raw = raw.buffer
+            if hasattr(raw, 'raw'):
+                raw = raw.raw
             f.close()
             self.assertRaises(ValueError, f.flush)
             self.assertRaises(ValueError, f.fileno)
@@ -2512,6 +2517,7 @@ class MiscIOTest(unittest.TestCase):
             self.assertRaises(ValueError, f.read)
             if hasattr(f, "read1"):
                 self.assertRaises(ValueError, f.read1, 1024)
+            self.assertRaises(ValueError, raw.readall)

Why not simply:

             if hasattr(f, "readall"):
                 self.assertRaises(ValueError, f.readall, 1024)
msg136848 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-25 14:28
RawIOBase.readall() fails if .read() returns None: fix attached.
msg136850 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-25 14:54
bufferedreader_readall.patch: BufferedReader.read(None) calls raw.readall() if available.

bufferedreader_readall.patch requires fileio_readall_closed.patch and rawiobase_readall.patch fixes.
msg136851 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-25 15:00
> bufferedreader_readall.patch: BufferedReader.read(None) calls
> raw.readall() if available.

Have you run any benchmarks?
msg136853 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-25 15:05
Number of syscalls without the patch -> with the patch:

- read the README file, text mode: 17 -> 12 syscalls (-5)
- read the README file, binary mode: 15 -> 11 syscalls (-4)
- read a binary file of 10 MB: 17 -> 12 syscalls (-5)

Quick benchmark open('README').read(); open('README', 'rb').read() in a loop: 359.1 ms -> 335.4 ms (-7%). So it's not really faster.
msg136855 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-25 15:20
About rawiobase_readall.patch: why do you use PyObject_Size() if you know chunks is a list? PyList_GET_SIZE is much more efficient.

About bufferedreader_readall.patch:

+            PyBytes_Concat(&data, all);
+            if (data == NULL)
+                return NULL;

You must Py_DECREF(all) first.

Also, you should check that "all" is either None or a bytes object.
msg136891 - (view) Author: Roundup Robot (python-dev) Date: 2011-05-25 20:13
New changeset 742ff94cdd20 by Victor Stinner in branch '3.1':
Issue #12175: FileIO.readall() now raises a ValueError instead of an IOError if
http://hg.python.org/cpython/rev/742ff94cdd20

New changeset 745e373c0b8e by Victor Stinner in branch '3.2':
(Merge 3.1) Issue #12175: FileIO.readall() now raises a ValueError instead of
http://hg.python.org/cpython/rev/745e373c0b8e

New changeset 9051275a68fe by Victor Stinner in branch 'default':
(Merge 3.2) Issue #12175: FileIO.readall() now raises a ValueError instead of
http://hg.python.org/cpython/rev/9051275a68fe
msg136892 - (view) Author: Roundup Robot (python-dev) Date: 2011-05-25 20:18
New changeset f2414bb35c96 by Victor Stinner in branch '2.7':
Issue #12175: FileIO.readall() now raises a ValueError instead of an IOError if
http://hg.python.org/cpython/rev/f2414bb35c96
msg136897 - (view) Author: Roundup Robot (python-dev) Date: 2011-05-25 20:53
New changeset 4a7118cff1d3 by Victor Stinner in branch '3.1':
Issue #12175: RawIOBase.readall() now returns None if read() returns None.
http://hg.python.org/cpython/rev/4a7118cff1d3

New changeset fb29dc650d24 by Victor Stinner in branch '3.2':
(Merge 3.1) Issue #12175: RawIOBase.readall() now returns None if read()
http://hg.python.org/cpython/rev/fb29dc650d24

New changeset 325453be64ec by Victor Stinner in branch 'default':
(Merge 3.2) Issue #12175: RawIOBase.readall() now returns None if read()
http://hg.python.org/cpython/rev/325453be64ec

New changeset 43a498da8680 by Victor Stinner in branch '2.7':
Issue #12175: RawIOBase.readall() now returns None if read() returns None.
http://hg.python.org/cpython/rev/43a498da8680
msg136902 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-25 21:32
> Attached patch [fileio_readall.patch] fixes both problems.

> Looks ok to me. Did you test under Windows?

Yes, test_io pass on Windows Vista 64 bits.

> Did you run some benchmarks?

Yes, see msg136853. Do you mean that I should not touch FileIO.readall() if it doesn't make it faster? (only 7% faster)
msg136903 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-25 21:42
> You must Py_DECREF(all) first.
> Also, you should check that "all" is either None or a bytes object.

Right, fixed in bufferedreader_readall-2.patch. By the way, thanks for your reviews.

So, do you think that fileio_readall.patch and bufferedreader_readall-2.patch should be commited, or these changes are useless?
msg136909 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-25 21:52
> So, do you think that fileio_readall.patch and
> bufferedreader_readall-2.patch should be commited, or these changes
> are useless?

They should be committed.
Thank you :)
msg136912 - (view) Author: Roundup Robot (python-dev) Date: 2011-05-25 22:22
New changeset 72c89daace57 by Victor Stinner in branch 'default':
Issue #12175: FileIO.readall() now only reads the file position and size once.
http://hg.python.org/cpython/rev/72c89daace57

New changeset 3c7792ec4547 by Victor Stinner in branch 'default':
Issue #12175: BufferedReader.read(-1) now calls raw.readall() if available.
http://hg.python.org/cpython/rev/3c7792ec4547
History
Date User Action Args
2011-05-30 00:30:52hayposetstatus: open -> closed
resolution: fixed
2011-05-25 22:22:48python-devsetmessages: + msg136912
2011-05-25 21:52:44pitrousetmessages: + msg136909
2011-05-25 21:46:39hayposetfiles: - rawiobase_readall.patch
2011-05-25 21:45:46hayposetfiles: - fileio_readall_closed.patch
2011-05-25 21:44:44hayposetfiles: - bufferedreader_readall.patch
2011-05-25 21:44:18hayposetfiles: + bufferedreader_readall-2.patch
2011-05-25 21:44:12hayposetfiles: - bufferedreader_readall-2.patch
2011-05-25 21:42:32hayposetfiles: + bufferedreader_readall-2.patch

messages: + msg136903
2011-05-25 21:32:15hayposetmessages: + msg136902
2011-05-25 20:53:58python-devsetmessages: + msg136897
2011-05-25 20:18:17python-devsetmessages: + msg136892
2011-05-25 20:13:53python-devsetnosy: + python-dev
messages: + msg136891
2011-05-25 15:20:50pitrousetmessages: + msg136855
2011-05-25 15:05:55hayposetmessages: + msg136853
2011-05-25 15:00:37pitrousetmessages: + msg136851
2011-05-25 14:54:18hayposetfiles: + bufferedreader_readall.patch

messages: + msg136850
2011-05-25 14:28:10hayposetfiles: + rawiobase_readall.patch

messages: + msg136848
2011-05-25 14:10:57pitrousetmessages: + msg136846
2011-05-25 14:09:25hayposetfiles: + fileio_readall_closed.patch

messages: + msg136845
2011-05-25 12:21:27pitrousetmessages: + msg136843
2011-05-25 12:19:47pitrousetmessages: + msg136842
2011-05-25 12:13:59haypocreate