Message211714
Regression: Behavior of ZipFile with file-like object and BufferedWriter.
The following code worked with Python 2.6:
LOB_BLOCKSIZE = 1024*1024 # 1 MB
class UnbufferedBlobWriter(io.RawIOBase):
"""
A file-like wrapper for a write-only cx_Oracle BLOB object.
"""
def __init__(self, blobLocator):
self.blobLocator = blobLocator
self.offset = 0
self.blobLocator.open()
def seekable(self):
return True
def seek(self, offset, whence):
if whence == 0:
self.offset = offset
elif whence == 1:
self.offset += offset
if self.offset < 0:
self.offset = 0
elif whence == 2:
if offset <= 0 and -offset <= self.blobLocator.size():
self.offset = self.blobLocator.size() + offset
else:
raise IOError(96, "Invalid offset for BlobWriter")
else:
self._unsupported("seek")
return self.offset
def writable(self):
return True
def write(self, data, offset=None):
if offset is None:
offset = self.offset
self.blobLocator.write(bytes(data), offset + 1)
self.offset = offset + len(data)
return len(data)
def close(self):
self.flush()
self.blobLocator.close()
def BlobWriter(blobLocator):
"""
A file-like wrapper (buffered) for a write-only cx_Oracle BLOB object.
"""
return io.BufferedWriter(UnbufferedBlobWriter(blobLocator), LOB_BLOCKGROESSE)
Note: The cx_Oracle BLOB object is used to store binary content inside a database.
It's basically a file-like-like object.
I'm using it in conjunction with a ZipFile object to store a ZipFile as a BLOB
inside the DB, like this:
curs.execute("""
insert into ... values (..., Empty_BLOB())
returning BDATA into :po_BDATA
""",
[..., blobvar])
blob = BlobWriter(blobvar.getvalue())
archive = ZipFile(blob, "w", ZIP_DEFLATED)
for filename in ...:
self.log.debug("Saving to ZIP file in the DB: %s", filename)
archive.write(filename, filename)
archive.close()
This used to work with Python 2.6.
With Python 2.7.5 however, somethin like this gets written into the blob:
<memory at 0x......>
Digging deeper, I found out that when using the UnbufferedBlobWriter directly
(without BufferedWriter), the problem does not occur.
It seems like the behaviour of the BufferedWriter class changed from 2.6 to 2.7,
most probably caused by the internal optimization of using the memoryview class.
As a workaround, I had to change my write method, calling tobytes() if necessary:
def write(self, data, offset=None):
if offset is None:
offset = self.offset
if hasattr(data, "tobytes"):
self.blobLocator.write(data.tobytes(), offset + 1)
else:
self.blobLocator.write(bytes(data), offset + 1)
self.offset = offset + len(data)
return len(data)
I'm not sure if this is actually a bug in 2.7 or if my usage of BufferedWriter
is incorrect (see remark).
For understanding the problem it is important to know that the ZipFile.write
method often calls write and seek.
Remark:
If I am mis-using BufferedWriter: What precisely is wrong? And if so,
why is it so complicated to support a buffered-random-writer?
I cannot use io.BufferedRandom because I don't have a read method
(and ZipFile.write does not need that). |
|
Date |
User |
Action |
Args |
2014-02-20 09:49:52 | Henning.von.Bargen | set | recipients:
+ Henning.von.Bargen |
2014-02-20 09:49:52 | Henning.von.Bargen | set | messageid: <1392889792.36.0.922412998797.issue20699@psf.upfronthosting.co.za> |
2014-02-20 09:49:52 | Henning.von.Bargen | link | issue20699 messages |
2014-02-20 09:49:51 | Henning.von.Bargen | create | |
|