This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile extractall needlessly re-wraps ZipInfo instances
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: peterbe, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2018-02-01 15:10 by peterbe, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL Status Linked Edit
PR 5472 open peterbe, 2018-02-01 15:17
Messages (3)
msg311434 - (view) Author: Peter Bengtsson (peterbe) * Date: 2018-02-01 15:10
The ZipFile class as a extractall method [0] that allows you to leave the 'members' empty. If empty, the 'members' becomes a list of all the *names* of files in the zip. Then it iterates over the names as sends each to `self._extract_member`. But that method needs it to be a ZipInfo object instead of a file name, so it re-wraps it [2].

Instead we can use `self.infolist()` to avoid that re-wrapping inside each `self._extract_member` call. 


[0] hhttps://github.com/python/cpython/blob/12e7cd8a51956a5ce373aac692ae6366c5f86584/Lib/zipfile.py#L1579
[1] https://github.com/python/cpython/blob/12e7cd8a51956a5ce373aac692ae6366c5f86584/Lib/zipfile.py#L1586
[2] https://github.com/python/cpython/blob/12e7cd8a51956a5ce373aac692ae6366c5f86584/Lib/zipfile.py#L1615-L1616
msg311435 - (view) Author: Peter Bengtsson (peterbe) * Date: 2018-02-01 15:12
(PS. I'm new to filing Python bugs and submitting patches. I *think* this is the right version. I've only been looking at 'master'.)
msg314539 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-03-27 16:53
There is an obscure behavior change introduced by this PR. Technically a ZIP file can contain several entries with the same name. Currently extractall() will extract the last of them multiple times.  The result doesn't differ from when extract it only once, there is just a waste of time. After merging this PR extractall() will extract different entries to the same location. If one of the is a directory, and other is a file, extractall() will fail.

I'm not sure it can be considered a valid ZIP file, but currently the zipfile module supports it.

On other hand, what is the benefit of using `self.infolist()` instead of `self.namelist()`?
History
Date User Action Args
2022-04-11 14:58:57adminsetgithub: 76923
2018-03-27 16:53:40serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg314539
2018-02-01 15:17:23peterbesetkeywords: + patch
stage: patch review
pull_requests: + pull_request5298
2018-02-01 15:12:05peterbesetmessages: + msg311435
versions: + Python 3.8
2018-02-01 15:10:53peterbecreate