classification
Title: Document that binary IO classes work with bytes-likes objects
Type: behavior Stage: resolved
Components: Documentation, IO Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Henning.von.Bargen, benjamin.peterson, docs@python, martin.panter, pitrou, python-dev, r.david.murray, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-02-20 09:49 by Henning.von.Bargen, last changed 2016-06-11 05:58 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
bytes-like-param.patch martin.panter, 2015-01-10 07:44 review
bytes-like-param.v2.patch martin.panter, 2015-02-12 11:13 review
bytes-like-param.v3.patch martin.panter, 2015-02-20 04:27 review
bytes-like-param.v4.patch martin.panter, 2015-02-23 04:37 review
bytes-like-param.v5.patch martin.panter, 2016-04-15 11:55 review
bytes-like-param.v6.patch martin.panter, 2016-05-22 02:09 review
bytes-like-param.py2.patch martin.panter, 2016-05-28 00:34 review
bytes-like-param.py2.v8.patch martin.panter, 2016-06-03 06:25 review
bytes-like-param.py2.v9.patch martin.panter, 2016-06-08 05:41 review
Messages (31)
msg211714 - (view) Author: Henning von Bargen (Henning.von.Bargen) Date: 2014-02-20 09:49
Regression: Behavior of ZipFile with file-like object and BufferedWriter.

The following code worked with Python 2.6:


LOB_BLOCKSIZE = 1024*1024 # 1 MB

class UnbufferedBlobWriter(io.RawIOBase):
    """
    A file-like wrapper for a write-only cx_Oracle BLOB object.
    """
    def __init__(self, blobLocator):
        self.blobLocator = blobLocator
        self.offset = 0
        self.blobLocator.open()

    def seekable(self):
        return True
        
    def seek(self, offset, whence):
        if whence == 0:
            self.offset = offset
        elif whence == 1:
            self.offset += offset
            if self.offset < 0:
                self.offset = 0
        elif whence == 2:
            if offset <= 0 and -offset <= self.blobLocator.size():
                self.offset = self.blobLocator.size() + offset
            else:
                raise IOError(96, "Invalid offset for BlobWriter")
        else:
            self._unsupported("seek")
        return self.offset
        
    def writable(self):
        return True

    def write(self, data, offset=None):
        if offset is None:
            offset = self.offset
        self.blobLocator.write(bytes(data), offset + 1)
        self.offset = offset + len(data)
        return len(data)

    def close(self):
        self.flush()
        self.blobLocator.close()        


def BlobWriter(blobLocator):
    """
    A file-like wrapper (buffered) for a write-only cx_Oracle BLOB object.
    """
    return io.BufferedWriter(UnbufferedBlobWriter(blobLocator), LOB_BLOCKGROESSE)


Note: The cx_Oracle BLOB object is used to store binary content inside a database.
It's basically a file-like-like object.

I'm using it in conjunction with a ZipFile object to store a ZipFile as a BLOB
inside the DB, like this:

    curs.execute("""
		 insert into ...  values (..., Empty_BLOB())
		 returning BDATA into :po_BDATA
		 """,
		 [..., blobvar])
    blob = BlobWriter(blobvar.getvalue())
    archive = ZipFile(blob, "w", ZIP_DEFLATED)
    for filename in ...:
	self.log.debug("Saving to ZIP file in the DB: %s", filename)
	archive.write(filename, filename)
    archive.close()

This used to work with Python 2.6.

With Python 2.7.5 however, somethin like this gets written into the blob:
<memory at 0x......>

Digging deeper, I found out that when using the UnbufferedBlobWriter directly
(without BufferedWriter), the problem does not occur.

It seems like the behaviour of the BufferedWriter class changed from 2.6 to 2.7,
most probably caused by the internal optimization of using the memoryview class.

As a workaround, I had to change my write method, calling tobytes() if necessary:

    def write(self, data, offset=None):
        if offset is None:
            offset = self.offset
        if hasattr(data, "tobytes"):
            self.blobLocator.write(data.tobytes(), offset + 1)
        else:
            self.blobLocator.write(bytes(data), offset + 1)
        self.offset = offset + len(data)
        return len(data)


I'm not sure if this is actually a bug in 2.7 or if my usage of BufferedWriter
is incorrect (see remark).

For understanding the problem it is important to know that the ZipFile.write
method often calls write and seek.

Remark:
If I am mis-using BufferedWriter: What precisely is wrong? And if so,
why is it so complicated to support a buffered-random-writer?
I cannot use io.BufferedRandom because I don't have a read method
(and ZipFile.write does not need that).
msg220019 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-06-08 06:28
I have a related issue in Python 3.4. I suspect it is the same underlying problem as Henning’s. BufferedWriter is trying to write memoryview() objects, but the documentation for RawIOBase.write() implies it only has to accept bytes() and bytearray() objects.

>>> from io import BufferedWriter, RawIOBase
>>> class Raw(RawIOBase):
...     def writable(self): return True
...     def write(self, b): print(b.startswith(b"\n"))
... 
>>> b = BufferedWriter(Raw())
>>> b.write(b"abc")
3
>>> b.close()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in write
AttributeError: 'memoryview' object has no attribute 'startswith'
msg233802 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-01-10 07:44
I think the simplest thing to do here would be to update the documentation to match the usage. This patch does so, saying that all write() methods, as well as the BytesIO() constructor, have to accept bytes-like objects. It also expands some tests to verify this, and fixes a resulting bug in _pyio.
msg235805 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-12 11:13
Posting patch v2:

* Changed readinto() argument descriptions to “a pre-allocated, writable bytes-like buffer”, for both RawIOBase and BufferedIOBase
* Integrated the single-use test_memoryio.BytesIOMixin test class, which tricked me when I did the first patch
* Added tests for BufferedRWPair, BytesIO.readinto() etc methods with non-bytearray() buffers
* Fix _pyio.BufferedReader.readinto/1() for non-bytearray
msg235820 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-12 13:21
The patch LGTM. But native speaker should check documentation part.
msg236251 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-20 04:27
Posting patch v3; thanks for the reviews!

* Changed “buffer” → “object”
* Made it clearer that the bytes-like object generalization only applies to method arguments, not return values
* Minor fixes to wording
msg236279 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-02-20 13:04
What is your objection to "len(b)"?  When I read "len(b)" I know exactly what it means.  When I read "the number of bytes in b", I have to think about it it, because it could mean "the number of bytes that that b is long" or "the number of bytes that have been already written to b", and the latter is the meaning my mind goes to first, so it takes time for my mind to realize it is the first.
msg236282 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-20 13:37
Using len(b) is fine if b is a bytes() or bytearray() object, but a bytes-like object can be anything that you can pass to memoryview(). In general len(b) is not part of that protocol, so can return some other value, or even be unimplemented:

>>> from array import array
>>> b = array("H", range(2))
>>> len(b)
2
>>> bytes(b)  # Actually 4 bytes = 2 items × 2 bytes
b'\x00\x00\x01\x00'
>>> from ctypes import c_int
>>> b = c_int(100)
>>> len(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'c_long' has no len()
>>> bytes(b)  # 4 bytes despite not implementing len()
b'd\x00\x00\x00'

I see your point that “the number of bytes in b” can be misleading. I will think about the wording some more. Maybe we can come up with a third alternative, like “the number of bytes given” or something.
msg236283 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-20 13:41
Because it is not len(b). I fixed several bugs in Python code which called 
len() for bytes-like argument and failed with array.array or memoryview with 
non-byte items.

The term "bytes-like object" is slightly misleading. In some cases it implies 
indexing and len, and iterating, and may be slicing -- common operations for 
bytes, bytearray, array('B'), memoryview().cast('B'). In more narrow meaning 
it may require such operations as concatenation (operator +) and .startswith() 
-- common for bytes and bytearray. In more general meaning it requires only 
the support of buffer protocol and contiguity. In more general meaning it may 
be even non-contiguous.
msg236287 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-02-20 14:25
How about "the length of b in bytes"?
msg236426 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-23 04:37
Posting bytes-like-param.v4.patch, with David’s suggested wording
msg263247 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-12 13:25
After thinking about Issue 26720 (see also Issue 15994), I think it might be worth documenting not only that bytes-like objects may be passed, but in some cases the class should not be access the object after the method returns. This applies to at least RawIOBase.readinto() and especially RawIOBase.write(). If you want to save the write() data in memory, you have to make a copy, because the original may be lost when BufferedWriter overwrites its internal buffeer.
msg263483 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-15 11:55
Here is an updated patch merged with recent changes. I also added a sentence to the RawIOBase.write() and BufferedIOBase.write() documentation about access to the buffer after the method returns. And I added tests for this.
msg265850 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-19 06:46
Added few minor comments on Rietveld.
msg266043 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-22 02:09
I updated my patch thanks to Serhiy’s comments. Also omitted Arg Clinic generated code in this version to avoid future merge conflicts with the hashes.
msg266055 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-22 06:35
bytes-like-param.v6.patch LGTM.
msg266523 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-28 00:34
Porting this to Python 2 uncovered a few flaws that I am unsure how to best handle. In this patch I have added workarounds, but perhaps we can find better fixes.

1. The io.BytesIO.write() C implementation does not accept array.array() objects. I suspect it only accepts objects implementing the “new” memoryview() buffer protocol, and array() only implements the “old” buffer() protocol.

In Python 2, is array.array() considered a bytes-like object? If so, the proposed update to the write() documentation is not consistent. Perhaps it is a real bug to fix in the BytesIO implementation. Or is test_array_writes() just going too far for Python 2?

I added code to skip the relevant test in test_array_writes(). Maybe it would be simplest to document that in general, BufferedIOBase.write() only has to accept “new” memoryview()-able objects.

2. The _pyio.BufferedIOBase.readinto() native Python implementation cannot assign into arbitrary bytes-like objects. Python 2 does not have memoryview.cast("B"), buffer() objects seem to be read-only, and I can’t think of any other easy way.

I reverted back to only testing native Python readinto() methods with bytearray(). But perhaps it would be best to document that for Python 2, only memoryview()-able arrays of bytes are acceptable, which should be practical to  handle in native Python code.

3. The _pyio.BytesIO.write() and BufferedWriter.write() native Python implementations did not handle obscure bytes-like objects well. I added buffer() calls to the code. But I am not sure this is a good idea, because it will raise warnings with “python -3”. An alternative might be to again only require write() to accept memoryview()-able arrays of bytes.

Perhaps the Python 2 definition of “bytes-like object” should be restricted to the “new” memoryview() buffer protocol, and to 1-D arrays of bytes. I.e. memoryview(byteslike).ndim == 1 and memoryview(byteslike).format == "B".
msg266525 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-05-28 01:35
New changeset bd41151a7429 by Martin Panter in branch '3.5':
Issue #20699: Document that “io” methods accept bytes-like objects
https://hg.python.org/cpython/rev/bd41151a7429

New changeset c1b40a29dc7a by Martin Panter in branch 'default':
Issue #20699: Merge io bytes-like fixes from 3.5
https://hg.python.org/cpython/rev/c1b40a29dc7a
msg266537 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-05-28 06:44
Things are more complex in 2.7. Not all objects supporting old buffer protocol support new buffer protocol, and not all functions accepting old buffer protocol work with new buffer protocol.

>>> buffer(array.array('I'))
<read-only buffer for 0xb70fec00, size -1, offset 0 at 0xb70feb60>
>>> memoryview(array.array('I'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot make memory view because object does not have the buffer interface
>>> zlib.compress(buffer('abc'))
"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress(memoryview('abc'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: compress() argument 1 must be string or read-only buffer, not memoryview

The definition of the "bytes-like object" term (changed in 74e1c50498f8) looks not correct in 2.7, because it mentions memoryview (rarely supported in 2.7 besides backported from 3.x features), but not buffer (widely supported in 2.7). It is referred only in one place (in Doc/library/hmac.rst, copied from 3.x in issue21306). I would suggest to avoid using this term, and be more specific about supported protocols in every concrete case.
msg267057 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-03 06:25
I agree with avoiding the term “bytes-like object” in 2.7. It is easy to be specific when talking about concrete classes like BufferedWriter and BytesIO. But the harder question is what to specify for the abstract base classes.

In patch v8, I propose to change the documentation to refer to “sequences of bytes which are usable with ‘memoryview’ ”. Technically, some of the _pyio implementations assume more than this (e.g. len(), iteration and slice assignment), but I have left them alone.
msg267069 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-03 07:18
Since the _pyio implementation doesn't support general memoryview-like objects, I consider using the new buffer protocol in the C implementation an implementation detail. Other implementations can lack this support. I think it would be better to left the wording more general ("preallocated writable array of bytes"), but mention other concrete example (array('b'), are there other examples in the stdlib?).
msg267072 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-03 07:46
Actually this bug report was opened because an implementation that lacked memoryview support was broken by 2.7. The point is to document that (a subset of) memoryview objects may be passed to custom implementations.

Does this need a “changed in 2.7” notice? I am not so familiar with the history of Python 2 or the various buffer APIs to be sure.

Also see <https://bugs.python.org/issue20699#msg266523>, where I discovered array('b') does not work with io.BytesIO.write(). It seems you have to wrap it in buffer() first, presumably because array() does not support the “new” protocol, but buffer() does.
msg267079 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-03 08:09
array is supported explicitly in the Python implementation of readinto().

What if write not "usable with memoryview", but mention memoryview as an example?

   The object *b* should be a pre-allocated, writable array of bytes,
   such as :class:`bytearray`, :class:`array.array` or
   :class:`memoryview`.
msg267084 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-03 09:59
I don’t see the point of mentioning array() objects at all. It’s hard to support array in a Python 2 implementation, as demonstrated by readinto(). And the special support for array('b') won’t help if you pass in array('B') with values 128–255. I see this as an implementation detail of _pyio, rather than a need for others to follow its lead.

The array('b') code was added by r56225. I guess it is experimental code from an early phase of the io module. Also, test_array_writes() is disabled for _pyio in 2.7 (https://hg.python.org/cpython/diff/760a710eb6c1/Lib/test/test_io.py), and in 2.6 (http://svn.python.org/view/python/branches/trunk-bytearray/Lib/test/test_io.py?r1=61775&r2=61774&pathrev=61775&view=patch).

I think it is better to avoid “such as” and be specific about what has to be supported. Perhaps:

readinto(b): The object *b* should be a pre-allocated, writable array of bytes, either :class:`bytearray` or :class:`memoryview`.

.. versionchanged:: 2.7
   Support for :class:`memoryview`.

write(b): The object *b* should be an array of bytes, either :class:`str`, :class:`bytearray`, or :class:`memoryview`.

.. versionchanged:: 2.7
   Support for :class:`memoryview`.
msg267086 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-03 10:10
Why :class:`str` and not :class:`bytes`?
msg267087 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-03 10:15
I thought :class:`str` is better documented in Python 2, but I can write bytes if you prefer. I guess it is more consistent with the rest of io.rst.
msg267088 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-03 10:35
Current docs use "bytes or bytearray object" (without references). "bytes" is just an alias to "str" in 2.7, but I prefer to use this name, for consistency with 3.x and for accenting the binary nature of the data.

Your new wording LGTM.
msg267791 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-08 05:41
New patch that I plan to commit:

* Use bytes instead of str
* bytes, bytearray or memoryview for readinto() methods
* bytes or memoryview for write() methods
* Added single “Changed in version 2.7” notice under the IOBase class (rather than every version of readinto and write)
* Since we only require 1-D byte array memoryviews, I resurrected the len(b) text
* Removed outdated XXX comment about supporting buffer API, since bytearray and memoryview of bytes is now all that is required
msg268107 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-06-10 08:49
New changeset ec69518aeebc by Martin Panter in branch '2.7':
Issue #20699: Document that “io” methods should accept memoryview
https://hg.python.org/cpython/rev/ec69518aeebc
msg268185 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-11 05:28
Thanks for you help figuring this out Serhiy, especially for the Python 2 case.
msg268187 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-11 05:58
Oh, I forgot to say my LGTM on your last patch. It LGTM.

But I'm feeling the documentation epic about bytes-like objects is far from the end.
History
Date User Action Args
2016-06-11 05:58:43serhiy.storchakasetmessages: + msg268187
2016-06-11 05:28:44martin.pantersetstatus: open -> closed
resolution: fixed
messages: + msg268185

stage: commit review -> resolved
2016-06-10 08:49:42python-devsetmessages: + msg268107
2016-06-08 05:41:27martin.pantersetfiles: + bytes-like-param.py2.v9.patch

messages: + msg267791
stage: patch review -> commit review
2016-06-03 10:35:24serhiy.storchakasetmessages: + msg267088
2016-06-03 10:15:54martin.pantersetmessages: + msg267087
2016-06-03 10:10:01serhiy.storchakasetmessages: + msg267086
2016-06-03 09:59:13martin.pantersetmessages: + msg267084
2016-06-03 08:09:47serhiy.storchakasetmessages: + msg267079
2016-06-03 07:46:51martin.pantersetmessages: + msg267072
2016-06-03 07:18:49serhiy.storchakasetmessages: + msg267069
2016-06-03 06:25:52martin.pantersetfiles: + bytes-like-param.py2.v8.patch

messages: + msg267057
2016-05-28 06:44:41serhiy.storchakasetmessages: + msg266537
2016-05-28 01:35:48python-devsetnosy: + python-dev
messages: + msg266525
2016-05-28 00:34:35martin.pantersetfiles: + bytes-like-param.py2.patch

messages: + msg266523
2016-05-22 06:35:27serhiy.storchakasetmessages: + msg266055
2016-05-22 02:10:00martin.pantersetfiles: + bytes-like-param.v6.patch

messages: + msg266043
versions: + Python 3.6, - Python 3.4
2016-05-19 06:46:06serhiy.storchakasetmessages: + msg265850
2016-04-15 11:55:44martin.pantersetfiles: + bytes-like-param.v5.patch

messages: + msg263483
2016-04-12 13:25:06martin.pantersetmessages: + msg263247
2015-02-23 04:37:52martin.pantersetfiles: + bytes-like-param.v4.patch

messages: + msg236426
2015-02-20 14:25:43r.david.murraysetmessages: + msg236287
2015-02-20 13:41:16serhiy.storchakasetmessages: + msg236283
2015-02-20 13:37:46martin.pantersetmessages: + msg236282
2015-02-20 13:04:07r.david.murraysetmessages: + msg236279
2015-02-20 04:27:12martin.pantersetfiles: + bytes-like-param.v3.patch

messages: + msg236251
2015-02-12 13:21:24serhiy.storchakasetmessages: + msg235820
title: Behavior of ZipFile with file-like object and BufferedWriter. -> Document that binary IO classes work with bytes-likes objects
2015-02-12 11:13:08martin.pantersetfiles: + bytes-like-param.v2.patch

messages: + msg235805
2015-02-11 23:26:06serhiy.storchakasetnosy: + r.david.murray
2015-02-11 23:01:51serhiy.storchakalinkissue12340 superseder
2015-01-10 09:34:53serhiy.storchakasetnosy: + serhiy.storchaka
stage: patch review

versions: + Python 3.5
2015-01-10 07:44:25martin.pantersetfiles: + bytes-like-param.patch

assignee: docs@python
components: + Documentation

keywords: + patch
nosy: + docs@python
messages: + msg233802
2014-06-09 19:06:38ned.deilysetnosy: + pitrou, benjamin.peterson
2014-06-08 06:28:00martin.pantersetnosy: + martin.panter

messages: + msg220019
versions: + Python 3.4
2014-02-20 09:49:52Henning.von.Bargencreate