Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipfile increase in size #72905

Open
bertjwregeer mannequin opened this issue Nov 16, 2016 · 6 comments
Open

zipfile increase in size #72905

bertjwregeer mannequin opened this issue Nov 16, 2016 · 6 comments
Assignees
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@bertjwregeer
Copy link
Mannequin

bertjwregeer mannequin commented Nov 16, 2016

BPO 28719
Nosy @bertjwregeer, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = None
created_at = <Date 2016-11-16.21:16:06.828>
labels = ['3.7', 'type-bug', 'library']
title = 'zipfile increase in size'
updated_at = <Date 2016-11-19.09:14:23.144>
user = 'https://github.com/bertjwregeer'

bugs.python.org fields:

activity = <Date 2016-11-19.09:14:23.144>
actor = 'serhiy.storchaka'
assignee = 'serhiy.storchaka'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2016-11-16.21:16:06.828>
creator = 'X-Istence'
dependencies = []
files = []
hgrepos = []
issue_num = 28719
keywords = []
message_count = 6.0
messages = ['280992', '281000', '281001', '281004', '281006', '281212']
nosy_count = 2.0
nosy_names = ['X-Istence', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue28719'
versions = ['Python 3.6', 'Python 3.7']

@bertjwregeer
Copy link
Mannequin Author

bertjwregeer mannequin commented Nov 16, 2016

I am the current maintainer of WebOb, and noticed that on Python 3.6 and 3.7 I noticed that a test started failing.

Granted, the test is checking the size of the file created and it is not the brightest idea in a test, but it's been stable since Python 2.5...

https://travis-ci.org/Pylons/webob/jobs/176505096#L224

shows the failure.

_________________________ test_response_file_body_tell _________________________
    def test_response_file_body_tell():
        import zipfile
        from webob.response import ResponseBodyFile
        rbo = ResponseBodyFile(Response())
        assert rbo.tell() == 0
        writer = zipfile.ZipFile(rbo, 'w')
        writer.writestr('zinfo_or_arcname', b'foo')
        writer.close()
>       assert rbo.tell() == 133
E       assert 145 == 133
E        +  where 145 = <bound method ResponseBodyFile.tell of <body_file for <Response at 0x7fa6291f9eb8 200 OK>>>()
E        +    where <bound method ResponseBodyFile.tell of <body_file for <Response at 0x7fa6291f9eb8 200 OK>>> = <body_file for <Response at 0x7fa6291f9eb8 200 OK>>.tell
tests/test_response.py:608: AssertionError

I am not sure that this is necessarily a bug, but it would be good to know why files are no longer generated the same way.

@bertjwregeer bertjwregeer mannequin added performance Performance or resource usage 3.7 (EOL) end of life labels Nov 16, 2016
@serhiy-storchaka
Copy link
Member

Could you get a dump of rbo data?

@bertjwregeer
Copy link
Mannequin Author

bertjwregeer mannequin commented Nov 16, 2016

It's literally the string written:

writer.writestr('zinfo_or_arcname', b'foo')

rbo in this case is a simple file like object.

I can get dumps from Python 3.5 and Python 3.6 if necessary.

@serhiy-storchaka
Copy link
Member

Please make a dump. It should include not just literally the string written, but headers and other special fields.

I tried with rbo = io.BytesIO(), and get rbo.tell() == 133. Should be a difference between io.BytesIO and ResponseBodyFile. Maybe ResponseBodyFile is not seekable.

@bertjwregeer
Copy link
Mannequin Author

bertjwregeer mannequin commented Nov 16, 2016

Here's a dump from Python 3.6:

b'PK\x03\x04\x14\x00\x08\x00\x00\x00\xc0~pI\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00zinfo_or_arcnamefoo!es\x8c\x03\x00\x00\x00\x03\x00\x00\x00PK\x01\x02\x14\x03\x14\x00\x08\x00\x00\x00\xc0~pI!es\x8c\x03\x00\x00\x00\x03\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x01\x00\x00\x00\x00zinfo_or_arcnamePK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00>\x00\x00\x00=\x00\x00\x00\x00\x00'

You are correct that ResponseBodyFile does not have a seek() method and is not seekable. Adding seek() to ResponseBodyFile might be a little more complicated...

@serhiy-storchaka
Copy link
Member

If the output file is not seekable, zipfile sets bit 3 in file header flags and writes 12 or 20 (if ZIP64 extension is used) additional bytes after the compressed data. These bytes contain the CRC, compressed and uncompressed sizes. Corresponding fields in local file header are set to zero.

In case of writestr() this can be considered as a regression, since the CRC and sizes can be calculated before writing compressed data and saved in local file header.

But it would be not easy to fix this.

@serhiy-storchaka serhiy-storchaka added the stdlib Python modules in the Lib dir label Nov 19, 2016
@serhiy-storchaka serhiy-storchaka self-assigned this Nov 19, 2016
@serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error and removed performance Performance or resource usage labels Nov 19, 2016
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

1 participant