# HG changeset patch # Parent 154ae3af03173ce0f735b6dd47a37da97910f0b9 # Parent 570ada02d0f059cee1b3d8fbaa140a70245eb8fb Issue #20699: Document that “io” methods accept bytes-like objects This matches the usage of ZipFile and BufferedWriter. This still requires return values to be bytes() objects. Also document and test that the write() methods should only access their argument before they return. diff -r 570ada02d0f0 Doc/library/io.rst --- a/Doc/library/io.rst Fri Apr 15 02:27:11 2016 +0000 +++ b/Doc/library/io.rst Fri Apr 15 11:41:28 2016 +0000 @@ -66,7 +66,8 @@ Binary I/O ^^^^^^^^^^ -Binary I/O (also called *buffered I/O*) expects and produces :class:`bytes` +Binary I/O (also called *buffered I/O*) expects +:term:`bytes-like objects ` and produces :class:`bytes` objects. No encoding, decoding, or newline translation is performed. This category of streams can be used for all kinds of non-text data, and also when manual control over the handling of text data is desired. @@ -227,9 +228,10 @@ when operations they do not support are called. The basic type used for binary data read from or written to a file is - :class:`bytes`. :class:`bytearray`\s are accepted too, and in some cases - (such as :meth:`readinto`) required. Text I/O classes work with - :class:`str` data. + :class:`bytes`. Other :term:`bytes-like objects ` are + accepted as method arguments too. In some cases, such as + :meth:`~RawIOBase.readinto`, a writable object such as :class:`bytearray` + is required. Text I/O classes work with :class:`str` data. Note that calling any method (even inquiries) on a closed stream is undefined. Implementations may raise :exc:`ValueError` in this case. @@ -393,18 +395,22 @@ .. method:: readinto(b) - Read up to ``len(b)`` bytes into :class:`bytearray` *b* and return the + Read bytes into a pre-allocated, writable + :term:`bytes-like object` *b*, and return the number of bytes read. If the object is in non-blocking mode and no bytes are available, ``None`` is returned. .. method:: write(b) - Write the given :class:`bytes` or :class:`bytearray` object, *b*, to the - underlying raw stream and return the number of bytes written. This can - be less than ``len(b)``, depending on specifics of the underlying raw + Write the given :term:`bytes-like object`, *b*, to the + underlying raw stream, and return the number of + bytes written. This can be less than the length of *b* in + bytes, depending on specifics of the underlying raw stream, and especially if it is in non-blocking mode. ``None`` is returned if the raw stream is set not to block and no single byte could - be readily written to it. + be readily written to it. The caller may release or mutate *b* after + this method returns, so the implementation should only access *b* + during the method call. .. class:: BufferedIOBase @@ -476,8 +482,8 @@ .. method:: readinto(b) - Read up to ``len(b)`` bytes into bytearray *b* and return the number of - bytes read. + Read bytes into a pre-allocated, writable + :term:`bytes-like object` *b* and return the number of bytes read. Like :meth:`read`, multiple reads may be issued to the underlying raw stream, unless the latter is interactive. @@ -487,7 +493,8 @@ .. method:: readinto1(b) - Read up to ``len(b)`` bytes into bytearray *b*, using at most one call to + Read bytes into a pre-allocated, writable + :term:`bytes-like object` *b*, using at most one call to the underlying raw stream's :meth:`~RawIOBase.read` (or :meth:`~RawIOBase.readinto`) method. Return the number of bytes read. @@ -498,8 +505,8 @@ .. method:: write(b) - Write the given :class:`bytes` or :class:`bytearray` object, *b* and - return the number of bytes written (never less than ``len(b)``, since if + Write the given :term:`bytes-like object`, *b*, and return the number + of bytes written (always equal to the length of *b* in bytes, since if the write fails an :exc:`OSError` will be raised). Depending on the actual implementation, these bytes may be readily written to the underlying stream, or held in a buffer for performance and latency @@ -509,6 +516,9 @@ data needed to be written to the raw stream but it couldn't accept all the data without blocking. + The caller may release or mutate *b* after this method returns, + so the implementation should only access *b* during the method call. + Raw File I/O ^^^^^^^^^^^^ @@ -584,7 +594,8 @@ :class:`BufferedIOBase`. The buffer is discarded when the :meth:`~IOBase.close` method is called. - The argument *initial_bytes* contains optional initial :class:`bytes` data. + The argument *initial_bytes* is a :term:`bytes-like object` that + contains initial data. :class:`BytesIO` provides or overrides these methods in addition to those from :class:`BufferedIOBase` and :class:`IOBase`: @@ -682,7 +693,7 @@ .. method:: write(b) - Write the :class:`bytes` or :class:`bytearray` object, *b* and return the + Write the :term:`bytes-like object`, *b*, and return the number of bytes written. When in non-blocking mode, a :exc:`BlockingIOError` is raised if the buffer needs to be written out but the raw stream blocks. diff -r 570ada02d0f0 Lib/_pyio.py --- a/Lib/_pyio.py Fri Apr 15 02:27:11 2016 +0000 +++ b/Lib/_pyio.py Fri Apr 15 11:41:28 2016 +0000 @@ -296,8 +296,9 @@ called. The basic type used for binary data read from or written to a file is - bytes. bytearrays are accepted too, and in some cases (such as - readinto) needed. Text I/O classes work with str data. + bytes. Other bytes-like objects are accepted as method arguments too. In + some cases (such as readinto), a writable object is required. Text I/O + classes work with str data. Note that calling any method (even inquiries) on a closed stream is undefined. Implementations may raise OSError in this case. @@ -596,7 +597,7 @@ return data def readinto(self, b): - """Read up to len(b) bytes into bytearray b. + """Read bytes into a pre-allocated bytes-like object b. Returns an int representing the number of bytes read (0 for EOF), or None if the object is set not to block and has no data to read. @@ -606,7 +607,8 @@ def write(self, b): """Write the given buffer to the IO stream. - Returns the number of bytes written, which may be less than len(b). + Returns the number of bytes written, which may be less than the + length of b in bytes. """ self._unsupported("write") @@ -659,7 +661,7 @@ self._unsupported("read1") def readinto(self, b): - """Read up to len(b) bytes into bytearray b. + """Read bytes into a pre-allocated bytes-like object b. Like read(), this may issue multiple reads to the underlying raw stream, unless the latter is 'interactive'. @@ -673,7 +675,7 @@ return self._readinto(b, read1=False) def readinto1(self, b): - """Read up to len(b) bytes into *b*, using at most one system call + """Read bytes into buffer *b*, using at most one system call Returns an int representing the number of bytes read (0 for EOF). @@ -701,8 +703,8 @@ def write(self, b): """Write the given bytes buffer to the IO stream. - Return the number of bytes written, which is never less than - len(b). + Return the number of bytes written, which is always the length of b + in bytes. Raises BlockingIOError if the buffer is full and the underlying raw stream cannot accept more data at the moment. @@ -884,7 +886,8 @@ raise ValueError("write to closed file") if isinstance(b, str): raise TypeError("can't write str to binary stream") - n = len(b) + with memoryview(b) as view: + n = view.nbytes # Size of any bytes-like object if n == 0: return 0 pos = self._pos @@ -1090,14 +1093,13 @@ def _readinto(self, buf, read1): """Read data into *buf* with at most one system call.""" - if len(buf) == 0: - return 0 - # Need to create a memoryview object of type 'b', otherwise # we may not be able to assign bytes to it, and slicing it # would create a new object. if not isinstance(buf, memoryview): buf = memoryview(buf) + if buf.nbytes == 0: + return 0 buf = buf.cast('B') written = 0 diff -r 570ada02d0f0 Lib/test/test_io.py --- a/Lib/test/test_io.py Fri Apr 15 02:27:11 2016 +0000 +++ b/Lib/test/test_io.py Fri Apr 15 11:41:28 2016 +0000 @@ -45,6 +45,22 @@ except ImportError: threading = None +try: + import ctypes +except ImportError: + def byteslike(*pos, **kw): + return array.array("b", bytes(*pos, **kw)) +else: + def byteslike(*pos, **kw): + """Create a bytes-like object having no string or sequence methods""" + data = bytes(*pos, **kw) + obj = EmptyStruct() + ctypes.resize(obj, len(data)) + memoryview(obj).cast("B")[:] = data + return obj + class EmptyStruct(ctypes.Structure): + pass + def _default_chunk_size(): """Get the default TextIOWrapper chunk size""" with open(__file__, "r", encoding="latin-1") as f: @@ -284,7 +300,9 @@ self.assertEqual(f.tell(), 6) self.assertEqual(f.seek(-1, 1), 5) self.assertEqual(f.tell(), 5) - self.assertEqual(f.write(bytearray(b" world\n\n\n")), 9) + buffer = bytearray(b" world\n\n\n") + self.assertEqual(f.write(buffer), 9) + buffer[:] = b"*" * 9 # Overwrite our copy of the data self.assertEqual(f.seek(0), 0) self.assertEqual(f.write(b"h"), 1) self.assertEqual(f.seek(-1, 2), 13) @@ -297,20 +315,21 @@ def read_ops(self, f, buffered=False): data = f.read(5) self.assertEqual(data, b"hello") - data = bytearray(data) + data = byteslike(data) self.assertEqual(f.readinto(data), 5) - self.assertEqual(data, b" worl") + self.assertEqual(memoryview(data).tobytes(), b" worl") + data = bytearray(5) self.assertEqual(f.readinto(data), 2) self.assertEqual(len(data), 5) self.assertEqual(data[:2], b"d\n") self.assertEqual(f.seek(0), 0) self.assertEqual(f.read(20), b"hello world\n") self.assertEqual(f.read(1), b"") - self.assertEqual(f.readinto(bytearray(b"x")), 0) + self.assertEqual(f.readinto(byteslike(b"x")), 0) self.assertEqual(f.seek(-6, 2), 6) self.assertEqual(f.read(5), b"world") self.assertEqual(f.read(0), b"") - self.assertEqual(f.readinto(bytearray()), 0) + self.assertEqual(f.readinto(byteslike()), 0) self.assertEqual(f.seek(-6, 1), 5) self.assertEqual(f.read(5), b" worl") self.assertEqual(f.tell(), 10) @@ -321,6 +340,10 @@ f.seek(6) self.assertEqual(f.read(), b"world\n") self.assertEqual(f.read(), b"") + f.seek(0) + data = byteslike(5) + self.assertEqual(f.readinto1(data), 5) + self.assertEqual(bytes(data), b"hello") LARGE = 2**31 @@ -641,10 +664,22 @@ def test_array_writes(self): a = array.array('i', range(10)) n = len(a.tobytes()) - with self.open(support.TESTFN, "wb", 0) as f: - self.assertEqual(f.write(a), n) - with self.open(support.TESTFN, "wb") as f: - self.assertEqual(f.write(a), n) + def make_file_writer(): + return self.FileIO(support.TESTFN, "w") + def make_buffered_writer(): + return self.BufferedWriter(self.MockRawIO()) + def make_buffered_random(): + return self.BufferedRandom(self.MockRawIO()) + def make_buffered_rw_pair(): + return self.BufferedRWPair(self.MockRawIO(), self.MockRawIO()) + writer_factories = ( + self.BytesIO, make_file_writer, make_buffered_writer, + make_buffered_random, make_buffered_rw_pair, + ) + for factory in writer_factories: + with self.subTest(factory), factory() as f: + self.assertEqual(f.write(a), n) + f.writelines((a,)) def test_closefd(self): self.assertRaises(ValueError, self.open, support.TESTFN, 'w', @@ -803,6 +838,19 @@ with self.assertRaises(ValueError): self.open(support.TESTFN, 'w', newline='invalid') + def test_buffered_readinto_mixin(self): + # Test the implementation provided by BufferedIOBase + class Stream(self.BufferedIOBase): + def read(self, size): + return b"12345" + read1 = read + stream = Stream() + for method in ("readinto", "readinto1"): + with self.subTest(method): + buffer = byteslike(5) + self.assertEqual(getattr(stream, method)(buffer), 5) + self.assertEqual(bytes(buffer), b"12345") + class CIOTest(IOTest): @@ -1394,6 +1442,11 @@ bufio = self.tp(writer, 8) bufio.write(b"abc") self.assertFalse(writer._write_stack) + buffer = bytearray(b"def") + bufio.write(buffer) + buffer[:] = b"***" # Overwrite our copy of the data + bufio.flush() + self.assertEqual(b"".join(writer._write_stack), b"abcdef") def test_write_overflow(self): writer = self.MockRawIO() @@ -1720,11 +1773,13 @@ self.assertEqual(pair.read1(3), b"abc") def test_readinto(self): - pair = self.tp(self.BytesIO(b"abcdef"), self.MockRawIO()) - - data = bytearray(5) - self.assertEqual(pair.readinto(data), 5) - self.assertEqual(data, b"abcde") + for method in ("readinto", "readinto1"): + with self.subTest(method): + pair = self.tp(self.BytesIO(b"abcdef"), self.MockRawIO()) + + data = byteslike(5) + self.assertEqual(getattr(pair, method)(data), 5) + self.assertEqual(bytes(data), b"abcde") def test_write(self): w = self.MockRawIO() @@ -1732,7 +1787,9 @@ pair.write(b"abc") pair.flush() - pair.write(b"def") + buffer = bytearray(b"def") + pair.write(buffer) + buffer[:] = b"***" # Overwrite our copy of the data pair.flush() self.assertEqual(w._write_stack, [b"abc", b"def"]) diff -r 570ada02d0f0 Lib/test/test_memoryio.py --- a/Lib/test/test_memoryio.py Fri Apr 15 02:27:11 2016 +0000 +++ b/Lib/test/test_memoryio.py Fri Apr 15 11:41:28 2016 +0000 @@ -399,7 +399,16 @@ del __main__.PickleTestMemIO -class BytesIOMixin: +class PyBytesIOTest(MemoryTestMixin, MemorySeekTestMixin, unittest.TestCase): + # Test _pyio.BytesIO; class also inherited for testing C implementation + + UnsupportedOperation = pyio.UnsupportedOperation + + @staticmethod + def buftype(s): + return s.encode("ascii") + ioclass = pyio.BytesIO + EOF = b"" def test_getbuffer(self): memio = self.ioclass(b"1234567890") @@ -426,18 +435,6 @@ memio.close() self.assertRaises(ValueError, memio.getbuffer) - -class PyBytesIOTest(MemoryTestMixin, MemorySeekTestMixin, - BytesIOMixin, unittest.TestCase): - - UnsupportedOperation = pyio.UnsupportedOperation - - @staticmethod - def buftype(s): - return s.encode("ascii") - ioclass = pyio.BytesIO - EOF = b"" - def test_read1(self): buf = self.buftype("1234567890") memio = self.ioclass(buf) diff -r 570ada02d0f0 Modules/_io/bufferedio.c --- a/Modules/_io/bufferedio.c Fri Apr 15 02:27:11 2016 +0000 +++ b/Modules/_io/bufferedio.c Fri Apr 15 11:41:28 2016 +0000 @@ -190,8 +190,8 @@ PyDoc_STRVAR(bufferediobase_write_doc, "Write the given buffer to the IO stream.\n" "\n" - "Returns the number of bytes written, which is never less than\n" - "len(b).\n" + "Returns the number of bytes written, which is always the length of b\n" + "in bytes.\n" "\n" "Raises BlockingIOError if the buffer is full and the\n" "underlying raw stream cannot accept more data at the moment.\n"); diff -r 570ada02d0f0 Modules/_io/bytesio.c --- a/Modules/_io/bytesio.c Fri Apr 15 02:27:11 2016 +0000 +++ b/Modules/_io/bytesio.c Fri Apr 15 11:41:28 2016 +0000 @@ -546,7 +546,7 @@ buffer: Py_buffer(accept={rwbuffer}) / -Read up to len(buffer) bytes into buffer. +Read bytes into buffer. Returns number of bytes read (0 for EOF), or None if the object is set not to block as has no data to read. @@ -554,7 +554,7 @@ static PyObject * _io_BytesIO_readinto_impl(bytesio *self, Py_buffer *buffer) -/*[clinic end generated code: output=a5d407217dcf0639 input=71581f32635c3a31]*/ +/*[clinic end generated code: output=a5d407217dcf0639 input=239668f4d5c47d84]*/ { Py_ssize_t len, n; diff -r 570ada02d0f0 Modules/_io/clinic/bytesio.c.h --- a/Modules/_io/clinic/bytesio.c.h Fri Apr 15 02:27:11 2016 +0000 +++ b/Modules/_io/clinic/bytesio.c.h Fri Apr 15 11:41:28 2016 +0000 @@ -259,7 +259,7 @@ "readinto($self, buffer, /)\n" "--\n" "\n" -"Read up to len(buffer) bytes into buffer.\n" +"Read bytes into buffer.\n" "\n" "Returns number of bytes read (0 for EOF), or None if the object\n" "is set not to block as has no data to read."); @@ -419,4 +419,4 @@ exit: return return_value; } -/*[clinic end generated code: output=500ccc149587fac4 input=a9049054013a1b77]*/ +/*[clinic end generated code: output=65860972f1c4b5fc input=a9049054013a1b77]*/ diff -r 570ada02d0f0 Modules/_io/clinic/fileio.c.h --- a/Modules/_io/clinic/fileio.c.h Fri Apr 15 02:27:11 2016 +0000 +++ b/Modules/_io/clinic/fileio.c.h Fri Apr 15 11:41:28 2016 +0000 @@ -222,7 +222,7 @@ "write($self, b, /)\n" "--\n" "\n" -"Write bytes b to file, return number written.\n" +"Write buffer b to file, return number of bytes written.\n" "\n" "Only makes one system call, so not all of the data may be written.\n" "The number of bytes actually written is returned. In non-blocking mode,\n" @@ -364,4 +364,4 @@ #ifndef _IO_FILEIO_TRUNCATE_METHODDEF #define _IO_FILEIO_TRUNCATE_METHODDEF #endif /* !defined(_IO_FILEIO_TRUNCATE_METHODDEF) */ -/*[clinic end generated code: output=b1a20b10c81add64 input=a9049054013a1b77]*/ +/*[clinic end generated code: output=dcbc39b466598492 input=a9049054013a1b77]*/ diff -r 570ada02d0f0 Modules/_io/fileio.c --- a/Modules/_io/fileio.c Fri Apr 15 02:27:11 2016 +0000 +++ b/Modules/_io/fileio.c Fri Apr 15 11:41:28 2016 +0000 @@ -835,7 +835,7 @@ b: Py_buffer / -Write bytes b to file, return number written. +Write buffer b to file, return number of bytes written. Only makes one system call, so not all of the data may be written. The number of bytes actually written is returned. In non-blocking mode, @@ -844,7 +844,7 @@ static PyObject * _io_FileIO_write_impl(fileio *self, Py_buffer *b) -/*[clinic end generated code: output=b4059db3d363a2f7 input=ffbd8834f447ac31]*/ +/*[clinic end generated code: output=b4059db3d363a2f7 input=6e7908b36f0ce74f]*/ { Py_ssize_t n; int err; diff -r 570ada02d0f0 Modules/_io/iobase.c --- a/Modules/_io/iobase.c Fri Apr 15 02:27:11 2016 +0000 +++ b/Modules/_io/iobase.c Fri Apr 15 11:41:28 2016 +0000 @@ -53,8 +53,9 @@ "called.\n" "\n" "The basic type used for binary data read from or written to a file is\n" - "bytes. bytearrays are accepted too, and in some cases (such as\n" - "readinto) needed. Text I/O classes work with str data.\n" + "bytes. Other bytes-like objects are accepted as method arguments too.\n" + "In some cases (such as readinto), a writable object is required. Text\n" + "I/O classes work with str data.\n" "\n" "Note that calling any method (except additional calls to close(),\n" "which are ignored) on a closed stream should raise a ValueError.\n"