msg211714 - (view) |
Author: Henning von Bargen (Henning.von.Bargen) |
Date: 2014-02-20 09:49 |
Regression: Behavior of ZipFile with file-like object and BufferedWriter.
The following code worked with Python 2.6:
LOB_BLOCKSIZE = 1024*1024 # 1 MB
class UnbufferedBlobWriter(io.RawIOBase):
"""
A file-like wrapper for a write-only cx_Oracle BLOB object.
"""
def __init__(self, blobLocator):
self.blobLocator = blobLocator
self.offset = 0
self.blobLocator.open()
def seekable(self):
return True
def seek(self, offset, whence):
if whence == 0:
self.offset = offset
elif whence == 1:
self.offset += offset
if self.offset < 0:
self.offset = 0
elif whence == 2:
if offset <= 0 and -offset <= self.blobLocator.size():
self.offset = self.blobLocator.size() + offset
else:
raise IOError(96, "Invalid offset for BlobWriter")
else:
self._unsupported("seek")
return self.offset
def writable(self):
return True
def write(self, data, offset=None):
if offset is None:
offset = self.offset
self.blobLocator.write(bytes(data), offset + 1)
self.offset = offset + len(data)
return len(data)
def close(self):
self.flush()
self.blobLocator.close()
def BlobWriter(blobLocator):
"""
A file-like wrapper (buffered) for a write-only cx_Oracle BLOB object.
"""
return io.BufferedWriter(UnbufferedBlobWriter(blobLocator), LOB_BLOCKGROESSE)
Note: The cx_Oracle BLOB object is used to store binary content inside a database.
It's basically a file-like-like object.
I'm using it in conjunction with a ZipFile object to store a ZipFile as a BLOB
inside the DB, like this:
curs.execute("""
insert into ... values (..., Empty_BLOB())
returning BDATA into :po_BDATA
""",
[..., blobvar])
blob = BlobWriter(blobvar.getvalue())
archive = ZipFile(blob, "w", ZIP_DEFLATED)
for filename in ...:
self.log.debug("Saving to ZIP file in the DB: %s", filename)
archive.write(filename, filename)
archive.close()
This used to work with Python 2.6.
With Python 2.7.5 however, somethin like this gets written into the blob:
<memory at 0x......>
Digging deeper, I found out that when using the UnbufferedBlobWriter directly
(without BufferedWriter), the problem does not occur.
It seems like the behaviour of the BufferedWriter class changed from 2.6 to 2.7,
most probably caused by the internal optimization of using the memoryview class.
As a workaround, I had to change my write method, calling tobytes() if necessary:
def write(self, data, offset=None):
if offset is None:
offset = self.offset
if hasattr(data, "tobytes"):
self.blobLocator.write(data.tobytes(), offset + 1)
else:
self.blobLocator.write(bytes(data), offset + 1)
self.offset = offset + len(data)
return len(data)
I'm not sure if this is actually a bug in 2.7 or if my usage of BufferedWriter
is incorrect (see remark).
For understanding the problem it is important to know that the ZipFile.write
method often calls write and seek.
Remark:
If I am mis-using BufferedWriter: What precisely is wrong? And if so,
why is it so complicated to support a buffered-random-writer?
I cannot use io.BufferedRandom because I don't have a read method
(and ZipFile.write does not need that).
|
msg220019 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2014-06-08 06:28 |
I have a related issue in Python 3.4. I suspect it is the same underlying problem as Henning’s. BufferedWriter is trying to write memoryview() objects, but the documentation for RawIOBase.write() implies it only has to accept bytes() and bytearray() objects.
>>> from io import BufferedWriter, RawIOBase
>>> class Raw(RawIOBase):
... def writable(self): return True
... def write(self, b): print(b.startswith(b"\n"))
...
>>> b = BufferedWriter(Raw())
>>> b.write(b"abc")
3
>>> b.close()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in write
AttributeError: 'memoryview' object has no attribute 'startswith'
|
msg233802 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-01-10 07:44 |
I think the simplest thing to do here would be to update the documentation to match the usage. This patch does so, saying that all write() methods, as well as the BytesIO() constructor, have to accept bytes-like objects. It also expands some tests to verify this, and fixes a resulting bug in _pyio.
|
msg235805 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-02-12 11:13 |
Posting patch v2:
* Changed readinto() argument descriptions to “a pre-allocated, writable bytes-like buffer”, for both RawIOBase and BufferedIOBase
* Integrated the single-use test_memoryio.BytesIOMixin test class, which tricked me when I did the first patch
* Added tests for BufferedRWPair, BytesIO.readinto() etc methods with non-bytearray() buffers
* Fix _pyio.BufferedReader.readinto/1() for non-bytearray
|
msg235820 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-02-12 13:21 |
The patch LGTM. But native speaker should check documentation part.
|
msg236251 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-02-20 04:27 |
Posting patch v3; thanks for the reviews!
* Changed “buffer” → “object”
* Made it clearer that the bytes-like object generalization only applies to method arguments, not return values
* Minor fixes to wording
|
msg236279 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2015-02-20 13:04 |
What is your objection to "len(b)"? When I read "len(b)" I know exactly what it means. When I read "the number of bytes in b", I have to think about it it, because it could mean "the number of bytes that that b is long" or "the number of bytes that have been already written to b", and the latter is the meaning my mind goes to first, so it takes time for my mind to realize it is the first.
|
msg236282 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-02-20 13:37 |
Using len(b) is fine if b is a bytes() or bytearray() object, but a bytes-like object can be anything that you can pass to memoryview(). In general len(b) is not part of that protocol, so can return some other value, or even be unimplemented:
>>> from array import array
>>> b = array("H", range(2))
>>> len(b)
2
>>> bytes(b) # Actually 4 bytes = 2 items × 2 bytes
b'\x00\x00\x01\x00'
>>> from ctypes import c_int
>>> b = c_int(100)
>>> len(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'c_long' has no len()
>>> bytes(b) # 4 bytes despite not implementing len()
b'd\x00\x00\x00'
I see your point that “the number of bytes in b” can be misleading. I will think about the wording some more. Maybe we can come up with a third alternative, like “the number of bytes given” or something.
|
msg236283 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-02-20 13:41 |
Because it is not len(b). I fixed several bugs in Python code which called
len() for bytes-like argument and failed with array.array or memoryview with
non-byte items.
The term "bytes-like object" is slightly misleading. In some cases it implies
indexing and len, and iterating, and may be slicing -- common operations for
bytes, bytearray, array('B'), memoryview().cast('B'). In more narrow meaning
it may require such operations as concatenation (operator +) and .startswith()
-- common for bytes and bytearray. In more general meaning it requires only
the support of buffer protocol and contiguity. In more general meaning it may
be even non-contiguous.
|
msg236287 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2015-02-20 14:25 |
How about "the length of b in bytes"?
|
msg236426 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-02-23 04:37 |
Posting bytes-like-param.v4.patch, with David’s suggested wording
|
msg263247 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-04-12 13:25 |
After thinking about Issue 26720 (see also Issue 15994), I think it might be worth documenting not only that bytes-like objects may be passed, but in some cases the class should not be access the object after the method returns. This applies to at least RawIOBase.readinto() and especially RawIOBase.write(). If you want to save the write() data in memory, you have to make a copy, because the original may be lost when BufferedWriter overwrites its internal buffeer.
|
msg263483 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-04-15 11:55 |
Here is an updated patch merged with recent changes. I also added a sentence to the RawIOBase.write() and BufferedIOBase.write() documentation about access to the buffer after the method returns. And I added tests for this.
|
msg265850 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-05-19 06:46 |
Added few minor comments on Rietveld.
|
msg266043 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-05-22 02:09 |
I updated my patch thanks to Serhiy’s comments. Also omitted Arg Clinic generated code in this version to avoid future merge conflicts with the hashes.
|
msg266055 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-05-22 06:35 |
bytes-like-param.v6.patch LGTM.
|
msg266523 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-05-28 00:34 |
Porting this to Python 2 uncovered a few flaws that I am unsure how to best handle. In this patch I have added workarounds, but perhaps we can find better fixes.
1. The io.BytesIO.write() C implementation does not accept array.array() objects. I suspect it only accepts objects implementing the “new” memoryview() buffer protocol, and array() only implements the “old” buffer() protocol.
In Python 2, is array.array() considered a bytes-like object? If so, the proposed update to the write() documentation is not consistent. Perhaps it is a real bug to fix in the BytesIO implementation. Or is test_array_writes() just going too far for Python 2?
I added code to skip the relevant test in test_array_writes(). Maybe it would be simplest to document that in general, BufferedIOBase.write() only has to accept “new” memoryview()-able objects.
2. The _pyio.BufferedIOBase.readinto() native Python implementation cannot assign into arbitrary bytes-like objects. Python 2 does not have memoryview.cast("B"), buffer() objects seem to be read-only, and I can’t think of any other easy way.
I reverted back to only testing native Python readinto() methods with bytearray(). But perhaps it would be best to document that for Python 2, only memoryview()-able arrays of bytes are acceptable, which should be practical to handle in native Python code.
3. The _pyio.BytesIO.write() and BufferedWriter.write() native Python implementations did not handle obscure bytes-like objects well. I added buffer() calls to the code. But I am not sure this is a good idea, because it will raise warnings with “python -3”. An alternative might be to again only require write() to accept memoryview()-able arrays of bytes.
Perhaps the Python 2 definition of “bytes-like object” should be restricted to the “new” memoryview() buffer protocol, and to 1-D arrays of bytes. I.e. memoryview(byteslike).ndim == 1 and memoryview(byteslike).format == "B".
|
msg266525 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-05-28 01:35 |
New changeset bd41151a7429 by Martin Panter in branch '3.5':
Issue #20699: Document that “io” methods accept bytes-like objects
https://hg.python.org/cpython/rev/bd41151a7429
New changeset c1b40a29dc7a by Martin Panter in branch 'default':
Issue #20699: Merge io bytes-like fixes from 3.5
https://hg.python.org/cpython/rev/c1b40a29dc7a
|
msg266537 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-05-28 06:44 |
Things are more complex in 2.7. Not all objects supporting old buffer protocol support new buffer protocol, and not all functions accepting old buffer protocol work with new buffer protocol.
>>> buffer(array.array('I'))
<read-only buffer for 0xb70fec00, size -1, offset 0 at 0xb70feb60>
>>> memoryview(array.array('I'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot make memory view because object does not have the buffer interface
>>> zlib.compress(buffer('abc'))
"x\x9cKLJ\x06\x00\x02M\x01'"
>>> zlib.compress(memoryview('abc'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: compress() argument 1 must be string or read-only buffer, not memoryview
The definition of the "bytes-like object" term (changed in 74e1c50498f8) looks not correct in 2.7, because it mentions memoryview (rarely supported in 2.7 besides backported from 3.x features), but not buffer (widely supported in 2.7). It is referred only in one place (in Doc/library/hmac.rst, copied from 3.x in issue21306). I would suggest to avoid using this term, and be more specific about supported protocols in every concrete case.
|
msg267057 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-06-03 06:25 |
I agree with avoiding the term “bytes-like object” in 2.7. It is easy to be specific when talking about concrete classes like BufferedWriter and BytesIO. But the harder question is what to specify for the abstract base classes.
In patch v8, I propose to change the documentation to refer to “sequences of bytes which are usable with ‘memoryview’ ”. Technically, some of the _pyio implementations assume more than this (e.g. len(), iteration and slice assignment), but I have left them alone.
|
msg267069 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-06-03 07:18 |
Since the _pyio implementation doesn't support general memoryview-like objects, I consider using the new buffer protocol in the C implementation an implementation detail. Other implementations can lack this support. I think it would be better to left the wording more general ("preallocated writable array of bytes"), but mention other concrete example (array('b'), are there other examples in the stdlib?).
|
msg267072 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-06-03 07:46 |
Actually this bug report was opened because an implementation that lacked memoryview support was broken by 2.7. The point is to document that (a subset of) memoryview objects may be passed to custom implementations.
Does this need a “changed in 2.7” notice? I am not so familiar with the history of Python 2 or the various buffer APIs to be sure.
Also see <https://bugs.python.org/issue20699#msg266523>, where I discovered array('b') does not work with io.BytesIO.write(). It seems you have to wrap it in buffer() first, presumably because array() does not support the “new” protocol, but buffer() does.
|
msg267079 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-06-03 08:09 |
array is supported explicitly in the Python implementation of readinto().
What if write not "usable with memoryview", but mention memoryview as an example?
The object *b* should be a pre-allocated, writable array of bytes,
such as :class:`bytearray`, :class:`array.array` or
:class:`memoryview`.
|
msg267084 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-06-03 09:59 |
I don’t see the point of mentioning array() objects at all. It’s hard to support array in a Python 2 implementation, as demonstrated by readinto(). And the special support for array('b') won’t help if you pass in array('B') with values 128–255. I see this as an implementation detail of _pyio, rather than a need for others to follow its lead.
The array('b') code was added by r56225. I guess it is experimental code from an early phase of the io module. Also, test_array_writes() is disabled for _pyio in 2.7 (https://hg.python.org/cpython/diff/760a710eb6c1/Lib/test/test_io.py), and in 2.6 (http://svn.python.org/view/python/branches/trunk-bytearray/Lib/test/test_io.py?r1=61775&r2=61774&pathrev=61775&view=patch).
I think it is better to avoid “such as” and be specific about what has to be supported. Perhaps:
readinto(b): The object *b* should be a pre-allocated, writable array of bytes, either :class:`bytearray` or :class:`memoryview`.
.. versionchanged:: 2.7
Support for :class:`memoryview`.
write(b): The object *b* should be an array of bytes, either :class:`str`, :class:`bytearray`, or :class:`memoryview`.
.. versionchanged:: 2.7
Support for :class:`memoryview`.
|
msg267086 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-06-03 10:10 |
Why :class:`str` and not :class:`bytes`?
|
msg267087 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-06-03 10:15 |
I thought :class:`str` is better documented in Python 2, but I can write bytes if you prefer. I guess it is more consistent with the rest of io.rst.
|
msg267088 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-06-03 10:35 |
Current docs use "bytes or bytearray object" (without references). "bytes" is just an alias to "str" in 2.7, but I prefer to use this name, for consistency with 3.x and for accenting the binary nature of the data.
Your new wording LGTM.
|
msg267791 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-06-08 05:41 |
New patch that I plan to commit:
* Use bytes instead of str
* bytes, bytearray or memoryview for readinto() methods
* bytes or memoryview for write() methods
* Added single “Changed in version 2.7” notice under the IOBase class (rather than every version of readinto and write)
* Since we only require 1-D byte array memoryviews, I resurrected the len(b) text
* Removed outdated XXX comment about supporting buffer API, since bytearray and memoryview of bytes is now all that is required
|
msg268107 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-06-10 08:49 |
New changeset ec69518aeebc by Martin Panter in branch '2.7':
Issue #20699: Document that “io” methods should accept memoryview
https://hg.python.org/cpython/rev/ec69518aeebc
|
msg268185 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2016-06-11 05:28 |
Thanks for you help figuring this out Serhiy, especially for the Python 2 case.
|
msg268187 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-06-11 05:58 |
Oh, I forgot to say my LGTM on your last patch. It LGTM.
But I'm feeling the documentation epic about bytes-like objects is far from the end.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:58 | admin | set | github: 64898 |
2016-06-11 05:58:43 | serhiy.storchaka | set | messages:
+ msg268187 |
2016-06-11 05:28:44 | martin.panter | set | status: open -> closed resolution: fixed messages:
+ msg268185
stage: commit review -> resolved |
2016-06-10 08:49:42 | python-dev | set | messages:
+ msg268107 |
2016-06-08 05:41:27 | martin.panter | set | files:
+ bytes-like-param.py2.v9.patch
messages:
+ msg267791 stage: patch review -> commit review |
2016-06-03 10:35:24 | serhiy.storchaka | set | messages:
+ msg267088 |
2016-06-03 10:15:54 | martin.panter | set | messages:
+ msg267087 |
2016-06-03 10:10:01 | serhiy.storchaka | set | messages:
+ msg267086 |
2016-06-03 09:59:13 | martin.panter | set | messages:
+ msg267084 |
2016-06-03 08:09:47 | serhiy.storchaka | set | messages:
+ msg267079 |
2016-06-03 07:46:51 | martin.panter | set | messages:
+ msg267072 |
2016-06-03 07:18:49 | serhiy.storchaka | set | messages:
+ msg267069 |
2016-06-03 06:25:52 | martin.panter | set | files:
+ bytes-like-param.py2.v8.patch
messages:
+ msg267057 |
2016-05-28 06:44:41 | serhiy.storchaka | set | messages:
+ msg266537 |
2016-05-28 01:35:48 | python-dev | set | nosy:
+ python-dev messages:
+ msg266525
|
2016-05-28 00:34:35 | martin.panter | set | files:
+ bytes-like-param.py2.patch
messages:
+ msg266523 |
2016-05-22 06:35:27 | serhiy.storchaka | set | messages:
+ msg266055 |
2016-05-22 02:10:00 | martin.panter | set | files:
+ bytes-like-param.v6.patch
messages:
+ msg266043 versions:
+ Python 3.6, - Python 3.4 |
2016-05-19 06:46:06 | serhiy.storchaka | set | messages:
+ msg265850 |
2016-04-15 11:55:44 | martin.panter | set | files:
+ bytes-like-param.v5.patch
messages:
+ msg263483 |
2016-04-12 13:25:06 | martin.panter | set | messages:
+ msg263247 |
2015-02-23 04:37:52 | martin.panter | set | files:
+ bytes-like-param.v4.patch
messages:
+ msg236426 |
2015-02-20 14:25:43 | r.david.murray | set | messages:
+ msg236287 |
2015-02-20 13:41:16 | serhiy.storchaka | set | messages:
+ msg236283 |
2015-02-20 13:37:46 | martin.panter | set | messages:
+ msg236282 |
2015-02-20 13:04:07 | r.david.murray | set | messages:
+ msg236279 |
2015-02-20 04:27:12 | martin.panter | set | files:
+ bytes-like-param.v3.patch
messages:
+ msg236251 |
2015-02-12 13:21:24 | serhiy.storchaka | set | messages:
+ msg235820 title: Behavior of ZipFile with file-like object and BufferedWriter. -> Document that binary IO classes work with bytes-likes objects |
2015-02-12 11:13:08 | martin.panter | set | files:
+ bytes-like-param.v2.patch
messages:
+ msg235805 |
2015-02-11 23:26:06 | serhiy.storchaka | set | nosy:
+ r.david.murray
|
2015-02-11 23:01:51 | serhiy.storchaka | link | issue12340 superseder |
2015-01-10 09:34:53 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka stage: patch review
versions:
+ Python 3.5 |
2015-01-10 07:44:25 | martin.panter | set | files:
+ bytes-like-param.patch
assignee: docs@python components:
+ Documentation
keywords:
+ patch nosy:
+ docs@python messages:
+ msg233802 |
2014-06-09 19:06:38 | ned.deily | set | nosy:
+ pitrou, benjamin.peterson
|
2014-06-08 06:28:00 | martin.panter | set | nosy:
+ martin.panter
messages:
+ msg220019 versions:
+ Python 3.4 |
2014-02-20 09:49:52 | Henning.von.Bargen | create | |