This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile increase in size
Type: behavior Stage: needs patch
Components: Library (Lib) Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: X-Istence, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-11-16 21:16 by X-Istence, last changed 2022-04-11 14:58 by admin.

Messages (6)
msg280992 - (view) Author: Bert JW Regeer (X-Istence) * Date: 2016-11-16 21:16
I am the current maintainer of WebOb, and noticed that on Python 3.6 and 3.7 I noticed that a test started failing.

Granted, the test is checking the size of the file created and it is not the brightest idea in a test, but it's been stable since Python 2.5...

https://travis-ci.org/Pylons/webob/jobs/176505096#L224

shows the failure.

_________________________ test_response_file_body_tell _________________________
    def test_response_file_body_tell():
        import zipfile
        from webob.response import ResponseBodyFile
        rbo = ResponseBodyFile(Response())
        assert rbo.tell() == 0
        writer = zipfile.ZipFile(rbo, 'w')
        writer.writestr('zinfo_or_arcname', b'foo')
        writer.close()
>       assert rbo.tell() == 133
E       assert 145 == 133
E        +  where 145 = <bound method ResponseBodyFile.tell of <body_file for <Response at 0x7fa6291f9eb8 200 OK>>>()
E        +    where <bound method ResponseBodyFile.tell of <body_file for <Response at 0x7fa6291f9eb8 200 OK>>> = <body_file for <Response at 0x7fa6291f9eb8 200 OK>>.tell
tests/test_response.py:608: AssertionError

I am not sure that this is necessarily a bug, but it would be good to know why files are no longer generated the same way.
msg281000 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-16 22:23
Could you get a dump of rbo data?
msg281001 - (view) Author: Bert JW Regeer (X-Istence) * Date: 2016-11-16 22:30
It's literally the string written:

writer.writestr('zinfo_or_arcname', b'foo')

rbo in this case is a simple file like object.

I can get dumps from Python 3.5 and Python 3.6 if necessary.
msg281004 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-16 22:45
Please make a dump. It should include not just literally the string written, but headers and other special fields.

I tried with rbo = io.BytesIO(), and get rbo.tell() == 133. Should be a difference between io.BytesIO and ResponseBodyFile. Maybe ResponseBodyFile is not seekable.
msg281006 - (view) Author: Bert JW Regeer (X-Istence) * Date: 2016-11-16 22:58
Here's a dump from Python 3.6:

b'PK\x03\x04\x14\x00\x08\x00\x00\x00\xc0~pI\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00zinfo_or_arcnamefoo!es\x8c\x03\x00\x00\x00\x03\x00\x00\x00PK\x01\x02\x14\x03\x14\x00\x08\x00\x00\x00\xc0~pI!es\x8c\x03\x00\x00\x00\x03\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x01\x00\x00\x00\x00zinfo_or_arcnamePK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00>\x00\x00\x00=\x00\x00\x00\x00\x00'

You are correct that ResponseBodyFile does not have a seek() method and is not seekable. Adding seek() to ResponseBodyFile might be a little more complicated...
msg281212 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-19 09:14
If the output file is not seekable, zipfile sets bit 3 in file header flags and writes 12 or 20 (if ZIP64 extension is used) additional bytes after the compressed data. These bytes contain the CRC, compressed and uncompressed sizes. Corresponding fields in local file header are set to zero.

In case of writestr() this can be considered as a regression, since the CRC and sizes can be calculated before writing compressed data and saved in local file header.

But it would be not easy to fix this.
History
Date User Action Args
2022-04-11 14:58:39adminsetgithub: 72905
2016-11-19 09:14:23serhiy.storchakasetmessages: + msg281212

assignee: serhiy.storchaka
components: + Library (Lib)
type: resource usage -> behavior
stage: needs patch
2016-11-16 22:58:14X-Istencesetmessages: + msg281006
2016-11-16 22:45:17serhiy.storchakasetmessages: + msg281004
2016-11-16 22:30:25X-Istencesetmessages: + msg281001
2016-11-16 22:23:20serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg281000
2016-11-16 21:16:06X-Istencecreate