classification
Title: zipfile: inconsistent doc for ZIP64 file size
Type: behavior Stage: needs patch
Components: Documentation Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: berker.peksag, docs@python, mndavidoff, python-dev, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-12-19 09:04 by mndavidoff, last changed 2018-09-21 09:22 by mndavidoff.

Messages (10)
msg283597 - (view) Author: Monte Davidoff (mndavidoff) Date: 2016-12-19 09:04
The documentation for the zipfile module, https://docs.python.org/3.5/library/zipfile.html, contains inconsistent descriptions of the maximum size of a ZIP file when allowZip64 is False.

The second paragraph in the zipfile module documentation states:

"It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 4 GiB in size)."

Later on, in the description of the zipfile.ZipFile class, it says:

"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 2 GiB."

The two sizes (4 GiB and 2 GiB) should be the same. According to https://en.wikipedia.org/wiki/Zip_(file_format)#ZIP64, 4 GiB is the correct value.

There is a similar problem in the 2.7.13 documentation, https://docs.python.org/2.7/library/zipfile.html.
msg284448 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2017-01-02 03:12
New changeset 4685cd33087b by Berker Peksag in branch '3.5':
Issue #29013: Fix allowZip64 documentation
https://hg.python.org/cpython/rev/4685cd33087b

New changeset 7c5075a14459 by Berker Peksag in branch '3.6':
Issue #29013: Merge from 3.5
https://hg.python.org/cpython/rev/7c5075a14459

New changeset 6ca0f3fcf82f by Berker Peksag in branch 'default':
Issue #29013: Merge from 3.6
https://hg.python.org/cpython/rev/6ca0f3fcf82f
msg284449 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-01-02 03:13
Thanks for the report and for the analysis, Monte!
msg284469 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-02 06:22
The documentation was correct. The zipfile module supports *reading* ZIP files up to 4 GiB without the ZIP64 extension, but it requires allowZip64=True for *writing* over 2 GiB files to the ZIP file.

The 2 GiB limit is safer because generated ZIP files can be read by implementations that interpret 32-bit sizes as signed. For example Java don't have unsigned integers. And zipfile and zipimport in old Python versions unpack some fields as signed integers.
msg284475 - (view) Author: Monte Davidoff (mndavidoff) Date: 2017-01-02 08:14
Serhiy, thank you for the correction and the additional information. I tried reading a zip file larger than 4 GiB with allowZip64=False, and it worked, so it looks like allowZip64 only applies to writing. I suggest we fix the inconsistency in the documentation as follows:

(1) In the description of the zipfile.ZipFile class, revert the change back to:

"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 2 GiB."

(2) Change the second paragraph in the zipfile module documentation from:

"It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 4 GiB in size)."

to:

"It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 2 GiB in size)."
msg284519 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-01-03 00:57
Right, that's my fault. Apparently I missed the "[...] zipfile will create [...]" part :) What do you think about Monte's suggested changes Serhiy? Should we just revert 4685cd33087b (1) or do both (1) and (2)?
msg325728 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-19 07:43
I think we should just revert this. The zipfile module can handle ZIP files up to 4 GiB without the ZIP64 extensions, but it requires the ZIP64 extensions for creating ZIP files larger than 2 GiB. The ZIP64 extensions is required also for ZIP files with more than 65535 files.
msg325963 - (view) Author: Monte Davidoff (mndavidoff) Date: 2018-09-21 06:12
Serhiy, merely reverting the change would not fix the originally reported problem in the documentation. Based on your additional information, and to prevent the need to describe the ZIP64 extensions in more than one place, I suggest two changes:

(1) Change the second paragraph in the zipfile module documentation from:

"It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 4 GiB in size)."

to:

"It can handle ZIP files that use the ZIP64 extensions."

(2) In the description of the zipfile.ZipFile class, change:

"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 4 GiB."

to:

"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 2 GiB or contains more than 65535 files."
msg325974 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-09-21 08:23
This would look good too.

To be accurate, zipfile will create ZIP files that use the ZIP64 extensions when:

* It contains more than 65535 files.

* It is larger than 2 GiB. More accurate, when either the offset or the size of the central directory is larger than 2 GiB, so in theory it is possible to exceed the total size of 2 GiB without using ZIP64.

* The original size of any file is larger than 2 GiB.

I'm not sure we should describe the behavior in all details.
msg325980 - (view) Author: Monte Davidoff (mndavidoff) Date: 2018-09-21 09:22
I agree it may be better if we don't describe all the details of ZIP64. How about this rewording for the second change, so we don't have to give all the details?

(2) In the description of the zipfile.ZipFile class, change:

"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 4 GiB."

to:

"If allowZip64 is True (the default) zipfile will create ZIP files that use the ZIP64 extensions when necessary, for example, when the zipfile is larger than 2 GiB."
History
Date User Action Args
2018-09-21 09:22:26mndavidoffsetmessages: + msg325980
2018-09-21 08:23:32serhiy.storchakasetmessages: + msg325974
2018-09-21 06:12:18mndavidoffsetmessages: + msg325963
2018-09-19 07:43:06serhiy.storchakasetmessages: + msg325728
versions: + Python 3.8, - Python 3.5
2017-01-03 00:57:11berker.peksagsetmessages: + msg284519
stage: needs patch
2017-01-02 08:14:50mndavidoffsetmessages: + msg284475
2017-01-02 06:22:28serhiy.storchakasetstatus: closed -> open

nosy: + serhiy.storchaka
messages: + msg284469

resolution: fixed ->
stage: resolved -> (no value)
2017-01-02 03:13:23berker.peksagsetstatus: open -> closed

type: behavior
versions: + Python 3.6, Python 3.7
nosy: + berker.peksag

messages: + msg284449
resolution: fixed
stage: resolved
2017-01-02 03:12:15python-devsetnosy: + python-dev
messages: + msg284448
2016-12-19 09:04:33mndavidoffcreate