This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author wolma
Recipients martin.panter, serhiy.storchaka, vstinner, wolma
Date 2015-03-18.08:52:17
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1426668738.75.0.0757435516758.issue23688@psf.upfronthosting.co.za>
In-reply-to
Content
Thanks everyone for the lively discussion ! 

I like Serhiy's idea of making write work with arbitrary objects supporting the buffer protocol. In fact, I noticed before that GzipFile.write misbehaves with array.array input. It pretends to accept that, but it'll use len(data) for calculating the zip file metadata so reading from the file will later fail. I was assuming then that fixing that would be too complicated for a rather exotic usecase, but now that I see how simple it really is I think it should be done.

As for the concrete implementation, I guess an isinstance(data, bytes) check to speed up treatment of the most common input is a good idea, but I am not convinced that bytearray deserves the same attention.

Regarding memoryview.cast('B') vs memoryview.nbytes, I see Serhiy's point of keeping the patch size smaller. Personally though, I find use of nbytes much more self-explanatory than cast('B') the purpose of which was not immediately obvious to me. So I would opt for better readability of the final code rather than optimizing patch size, but I would be ok with either solution since it is not about the essence of the patch anyway.

Finally, the bug I report in issue21560 would be fixed as a side-effect of this patch here (because trying to get a memoryview from str would throw an early TypeError). Still, I think it would be a good idea to try to write to the wrapped fileobj *before* updating self.size and self.crc to be protected from unforeseen errors. So maybe we could include that change in the patch here ?

With all that the final code section could look like this:

        if isinstance(data, bytes):
            length = len(data)
        else:
            data = memoryview(data)
            length = data.nbytes

        if length > 0:
            self.fileobj.write( self.compress.compress(data) )
            self.size = self.size + length
            self.crc = zlib.crc32(data, self.crc) & 0xffffffff
            self.offset += length

        return length

One remaining detail then would be whether one would want to catch the TypeError possibly raised by the memoryview constructor to turn it into something less confusing (after all many users will not know what a memoryview has to do with all this). The above code would throw (with str input for example):

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/wolma/gzip-bug/Lib/gzip.py", line 340, in write
    data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'

Maybe, this could be turned into:

TypeError: must be bytes / bytes-like object, not 'str'  ?

to be consistent with the corresponding error in 'wt' mode ?

Let me know which of the above options you favour and I'll provide a new patch.
History
Date User Action Args
2015-03-18 08:52:18wolmasetrecipients: + wolma, vstinner, martin.panter, serhiy.storchaka
2015-03-18 08:52:18wolmasetmessageid: <1426668738.75.0.0757435516758.issue23688@psf.upfronthosting.co.za>
2015-03-18 08:52:18wolmalinkissue23688 messages
2015-03-18 08:52:17wolmacreate