# HG changeset patch # Parent 154ae3af03173ce0f735b6dd47a37da97910f0b9 # Parent 0df93ab07a8f053996950873b82ff5cc90217da7 Issue #20699: Document that “io” methods accept bytes-like objects This matches the usage of ZipFile and BufferedWriter. This still requires return values to be bytes() objects. Also document and test that the write() methods should only access their argument before they return. Work around the following Python 2 bugs: * io.BytesIO cannot write array objects * The native Python implementation of readinto() cannot write to arbitrary bytes-like objects diff -r 0df93ab07a8f Doc/library/io.rst --- a/Doc/library/io.rst Fri May 27 11:20:21 2016 +0000 +++ b/Doc/library/io.rst Fri May 27 23:47:48 2016 +0000 @@ -225,8 +225,11 @@ support are called. The basic type used for binary data read from or written to a file is - :class:`bytes` (also known as :class:`str`). :class:`bytearray`\s are - accepted too, and in some cases (such as :class:`readinto`) required. + :class:`bytes` (also known as :class:`str`). Other + :term:`bytes-like objects ` are + accepted as method arguments too. In some cases, such as + :meth:`~RawIOBase.readinto`, a writable object such as :class:`bytearray` + is required. Text I/O classes work with :class:`unicode` data. Note that calling any method (even inquiries) on a closed stream is @@ -383,18 +386,22 @@ .. method:: readinto(b) - Read up to len(b) bytes into bytearray *b* and return the number + Read bytes into a pre-allocated, writable + :term:`bytes-like object` *b*, and return the number of bytes read. If the object is in non-blocking mode and no bytes are available, ``None`` is returned. .. method:: write(b) - Write the given bytes or bytearray object, *b*, to the underlying raw - stream and return the number of bytes written. This can be less than - ``len(b)``, depending on specifics of the underlying raw stream, and + Write the given :term:`bytes-like object`, *b*, to the + underlying raw stream, and return the number of + bytes written. This can be less than the length of *b* in + bytes, depending on specifics of the underlying raw stream, and especially if it is in non-blocking mode. ``None`` is returned if the raw stream is set not to block and no single byte could be readily - written to it. + written to it. The caller may release or mutate *b* after + this method returns, so the implementation should only access *b* + during the method call. .. class:: BufferedIOBase @@ -465,8 +472,8 @@ .. method:: readinto(b) - Read up to len(b) bytes into bytearray *b* and return the number of bytes - read. + Read bytes into a pre-allocated, writable + :term:`bytes-like object` *b* and return the number of bytes read. Like :meth:`read`, multiple reads may be issued to the underlying raw stream, unless the latter is 'interactive'. @@ -476,8 +483,9 @@ .. method:: write(b) - Write the given bytes or bytearray object, *b* and return the number - of bytes written (never less than ``len(b)``, since if the write fails + Write the given :term:`bytes-like object`, *b*, and return the number + of bytes written (always equal to the length + of *b* in bytes, since if the write fails an :exc:`IOError` will be raised). Depending on the actual implementation, these bytes may be readily written to the underlying stream, or held in a buffer for performance and latency reasons. @@ -486,6 +494,9 @@ data needed to be written to the raw stream but it couldn't accept all the data without blocking. + The caller may release or mutate *b* after this method returns, + so the implementation should only access *b* during the method call. + Raw File I/O ------------ @@ -535,7 +546,8 @@ A stream implementation using an in-memory bytes buffer. It inherits :class:`BufferedIOBase`. - The argument *initial_bytes* is an optional initial :class:`bytes`. + The optional argument *initial_bytes* is a :term:`bytes-like object` that + contains initial data. :class:`BytesIO` provides or overrides these methods in addition to those from :class:`BufferedIOBase` and :class:`IOBase`: @@ -611,7 +623,8 @@ .. method:: write(b) - Write the bytes or bytearray object, *b* and return the number of bytes + Write the :term:`bytes-like object`, *b*, + and return the number of bytes written. When in non-blocking mode, a :exc:`BlockingIOError` is raised if the buffer needs to be written out but the raw stream blocks. diff -r 0df93ab07a8f Lib/_pyio.py --- a/Lib/_pyio.py Fri May 27 11:20:21 2016 +0000 +++ b/Lib/_pyio.py Fri May 27 23:47:48 2016 +0000 @@ -277,8 +277,9 @@ may raise a IOError when operations they do not support are called. The basic type used for binary data read from or written to a file is - bytes. bytearrays are accepted too, and in some cases (such as - readinto) needed. Text I/O classes work with str data. + bytes. Other bytes-like objects are accepted as method arguments too. In + some cases (such as readinto), a writable object is required. Text I/O + classes work with str data. Note that calling any method (even inquiries) on a closed stream is undefined. Implementations may raise IOError in this case. @@ -578,7 +579,7 @@ return data def readinto(self, b): - """Read up to len(b) bytes into b. + """Read bytes into a pre-allocated bytes-like object b. Returns number of bytes read (0 for EOF), or None if the object is set not to block and has no data to read. @@ -588,7 +589,8 @@ def write(self, b): """Write the given buffer to the IO stream. - Returns the number of bytes written, which may be less than len(b). + Returns the number of bytes written, which may be less than the + length of b in bytes. """ self._unsupported("write") @@ -649,7 +651,8 @@ Raises BlockingIOError if the underlying raw stream has no data at the moment. """ - # XXX This ought to work with anything that supports the buffer API + # It is not practical to write to arbitrary bytes-like objects in + # Python 2 data = self.read(len(b)) n = len(data) try: @@ -664,8 +667,8 @@ def write(self, b): """Write the given buffer to the IO stream. - Return the number of bytes written, which is never less than - len(b). + Return the number of bytes written, which is always the length of b + in bytes. Raises BlockingIOError if the buffer is full and the underlying raw stream cannot accept more data at the moment. @@ -840,7 +843,7 @@ raise ValueError("write to closed file") if isinstance(b, unicode): raise TypeError("can't write unicode to binary stream") - n = len(b) + n = len(buffer(b)) # Size of any bytes-like object if n == 0: return 0 pos = self._pos @@ -1098,7 +1101,7 @@ # raise BlockingIOError with characters_written == 0.) self._flush_unlocked() before = len(self._write_buf) - self._write_buf.extend(b) + self._write_buf.extend(buffer(b)) written = len(self._write_buf) - before if len(self._write_buf) > self.buffer_size: try: diff -r 0df93ab07a8f Lib/test/test_io.py --- a/Lib/test/test_io.py Fri May 27 11:20:21 2016 +0000 +++ b/Lib/test/test_io.py Fri May 27 23:47:48 2016 +0000 @@ -54,6 +54,19 @@ __metaclass__ = type bytes = support.py3k_bytes +try: + import ctypes +except ImportError: + def byteslike(*pos, **kw): + return array.array("b", bytearray(*pos, **kw)) +else: + def byteslike(*pos, **kw): + """Create a bytes-like object having no string or sequence methods""" + data = bytearray(*pos, **kw) + class Struct(ctypes.Structure): + _fields_ = (("b", ctypes.c_ubyte * len(data)),) + return Struct.from_buffer_copy(data) + def _default_chunk_size(): """Get the default TextIOWrapper chunk size""" with io.open(__file__, "r", encoding="latin1") as f: @@ -273,7 +286,9 @@ self.assertEqual(f.tell(), 6) self.assertEqual(f.seek(-1, 1), 5) self.assertEqual(f.tell(), 5) - self.assertEqual(f.write(bytearray(b" world\n\n\n")), 9) + buffer = bytearray(b" world\n\n\n") + self.assertEqual(f.write(buffer), 9) + buffer[:] = b"*" * 9 # Overwrite our copy of the data self.assertEqual(f.seek(0), 0) self.assertEqual(f.write(b"h"), 1) self.assertEqual(f.seek(-1, 2), 13) @@ -284,22 +299,28 @@ self.assertRaises(TypeError, f.seek, 0.0) def read_ops(self, f, buffered=False): + if f.readinto.__module__ == "_pyio": + # Native Python code cannot write to arbitrary bytes-like objects + buffer_factory = bytearray + else: + buffer_factory = byteslike data = f.read(5) self.assertEqual(data, b"hello") - data = bytearray(data) + data = buffer_factory(data) self.assertEqual(f.readinto(data), 5) - self.assertEqual(data, b" worl") + self.assertEqual(bytearray(data), b" worl") + data = bytearray(5) self.assertEqual(f.readinto(data), 2) self.assertEqual(len(data), 5) self.assertEqual(data[:2], b"d\n") self.assertEqual(f.seek(0), 0) self.assertEqual(f.read(20), b"hello world\n") self.assertEqual(f.read(1), b"") - self.assertEqual(f.readinto(bytearray(b"x")), 0) + self.assertEqual(f.readinto(buffer_factory(b"x")), 0) self.assertEqual(f.seek(-6, 2), 6) self.assertEqual(f.read(5), b"world") self.assertEqual(f.read(0), b"") - self.assertEqual(f.readinto(bytearray()), 0) + self.assertEqual(f.readinto(buffer_factory()), 0) self.assertEqual(f.seek(-6, 1), 5) self.assertEqual(f.read(5), b" worl") self.assertEqual(f.tell(), 10) @@ -504,10 +525,18 @@ def test_array_writes(self): a = array.array(b'i', range(10)) n = len(a.tostring()) - with self.open(support.TESTFN, "wb", 0) as f: - self.assertEqual(f.write(a), n) - with self.open(support.TESTFN, "wb") as f: - self.assertEqual(f.write(a), n) + def check(f): + with f: + self.assertEqual(f.write(a), n) + f.writelines((a,)) + if self.BytesIO is not io.BytesIO: + # C implementation thinks " 'array.array' does not have the + # buffer interface" + check(self.BytesIO()) + check(self.FileIO(support.TESTFN, "w")) + check(self.BufferedWriter(self.MockRawIO())) + check(self.BufferedRandom(self.MockRawIO())) + check(self.BufferedRWPair(self.MockRawIO(), self.MockRawIO())) def test_closefd(self): self.assertRaises(ValueError, self.open, support.TESTFN, 'w', @@ -649,6 +678,19 @@ support.gc_collect() self.assertEqual(recorded, []) + def test_buffered_readinto_mixin(self): + # Test the implementation provided by BufferedIOBase + class Stream(self.BufferedIOBase): + def read(self, size): + return b"12345" + stream = Stream() + buffer = bytearray(5) + if self.BufferedIOBase is not pyio.BufferedIOBase: + # Native Python code cannot write to arbitrary bytes-like objects + buffer = byteslike(buffer) + self.assertEqual(stream.readinto(buffer), 5) + self.assertEqual(bytearray(buffer), b"12345") + class CIOTest(IOTest): @@ -671,9 +713,7 @@ self.assertIsNone(wr(), wr) class PyIOTest(IOTest): - test_array_writes = unittest.skip( - "len(array.array) returns number of elements rather than bytelength" - )(IOTest.test_array_writes) + pass class CommonBufferedTests: @@ -1111,6 +1151,11 @@ bufio = self.tp(writer, 8) bufio.write(b"abc") self.assertFalse(writer._write_stack) + buffer = bytearray(b"def") + bufio.write(buffer) + buffer[:] = b"***" # Overwrite our copy of the data + bufio.flush() + self.assertEqual(b"".join(writer._write_stack), b"abcdef") def test_write_overflow(self): writer = self.MockRawIO() @@ -1441,8 +1486,10 @@ pair = self.tp(self.BytesIO(b"abcdef"), self.MockRawIO()) data = bytearray(5) + if self.tp is not pyio.BufferedRWPair: + data = byteslike(data) self.assertEqual(pair.readinto(data), 5) - self.assertEqual(data, b"abcde") + self.assertEqual(bytearray(data), b"abcde") def test_write(self): w = self.MockRawIO() @@ -1450,7 +1497,9 @@ pair.write(b"abc") pair.flush() - pair.write(b"def") + buffer = bytearray(b"def") + pair.write(buffer) + buffer[:] = b"***" # Overwrite our copy of the data pair.flush() self.assertEqual(w._write_stack, [b"abc", b"def"]) diff -r 0df93ab07a8f Lib/test/test_memoryio.py --- a/Lib/test/test_memoryio.py Fri May 27 11:20:21 2016 +0000 +++ b/Lib/test/test_memoryio.py Fri May 27 23:47:48 2016 +0000 @@ -396,6 +396,7 @@ class PyBytesIOTest(MemoryTestMixin, MemorySeekTestMixin, unittest.TestCase): + # Test _pyio.BytesIO; class also inherited for testing C implementation UnsupportedOperation = pyio.UnsupportedOperation diff -r 0df93ab07a8f Modules/_io/bufferedio.c --- a/Modules/_io/bufferedio.c Fri May 27 11:20:21 2016 +0000 +++ b/Modules/_io/bufferedio.c Fri May 27 23:47:48 2016 +0000 @@ -125,8 +125,8 @@ PyDoc_STRVAR(bufferediobase_write_doc, "Write the given buffer to the IO stream.\n" "\n" - "Returns the number of bytes written, which is never less than\n" - "len(b).\n" + "Returns the number of bytes written, which is always the length of b\n" + "in bytes.\n" "\n" "Raises BlockingIOError if the buffer is full and the\n" "underlying raw stream cannot accept more data at the moment.\n"); diff -r 0df93ab07a8f Modules/_io/bytesio.c --- a/Modules/_io/bytesio.c Fri May 27 11:20:21 2016 +0000 +++ b/Modules/_io/bytesio.c Fri May 27 23:47:48 2016 +0000 @@ -392,7 +392,7 @@ } PyDoc_STRVAR(readinto_doc, -"readinto(bytearray) -> int. Read up to len(b) bytes into b.\n" +"readinto(b) -> int. Read bytes into b.\n" "\n" "Returns number of bytes read (0 for EOF), or None if the object\n" "is set not to block and has no data to read."); diff -r 0df93ab07a8f Modules/_io/fileio.c --- a/Modules/_io/fileio.c Fri May 27 11:20:21 2016 +0000 +++ b/Modules/_io/fileio.c Fri May 27 23:47:48 2016 +0000 @@ -969,7 +969,7 @@ "or None if no data is available. On end-of-file, returns ''."); PyDoc_STRVAR(write_doc, -"write(b: bytes) -> int. Write bytes b to file, return number written.\n" +"write(b: bytes-like) -> int. Write b, return number of bytes written.\n" "\n" "Only makes one system call, so not all of the data may be written.\n" "The number of bytes actually written is returned. In non-blocking mode,\n" diff -r 0df93ab07a8f Modules/_io/iobase.c --- a/Modules/_io/iobase.c Fri May 27 11:20:21 2016 +0000 +++ b/Modules/_io/iobase.c Fri May 27 23:47:48 2016 +0000 @@ -38,8 +38,9 @@ "may raise a IOError when operations they do not support are called.\n" "\n" "The basic type used for binary data read from or written to a file is\n" - "bytes. bytearrays are accepted too, and in some cases (such as\n" - "readinto) needed. Text I/O classes work with str data.\n" + "bytes. Other bytes-like objects are accepted as method arguments too.\n" + "In some cases (such as readinto), a writable object is required. Text\n" + "I/O classes work with str data.\n" "\n" "Note that calling any method (except additional calls to close(),\n" "which are ignored) on a closed stream should raise a ValueError.\n"