This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gregory.p.smith
Recipients dhillier, gregory.p.smith, iritkatriel, yudilevi
Date 2022-03-22.06:31:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Examining Lib/ code, the existing code makes sense. Python's zipfile module produces modern zipfiles when writing by setting the utf-8 flag and storing the filename as utf-8 when it is not ASCII.  This is desirable for use with all normal zip implementations in the past 10-15 years.

When decoding a zipfile, if the utf-8 flag is not set, we assume cp437 per the pkware zip appnotes.txt "spec".  So our reading is correct as well, even for very old files.

This is being strict in what we produce an lenient in what we accept.  caveats?  yes:

If someone does need to produce zipfiles for use with ancient software that does not support utf-8, that also does not identify the unknown utf-8 flag as an error condition, it will interpret the name in a corrupt manner for non-ascii names.

Similarly, even if written with cp437 names (as PR 19335 would do), in old zip system implementations where the implementation blindly uses the users locale encoding instead of cp437, it will always see corrupt data in that scenario. (aka mojibake?)

These are not what I'd expect to be normal use cases. Do you have a common practical example of a need for this?

(The PR on issue28080 provides a way to _read_ legacy zip files that used a codec other than cp437 if you know what it was.)

--- may also be of interest regarding the zip format.
Date User Action Args
2022-03-22 06:31:22gregory.p.smithsetrecipients: + gregory.p.smith, dhillier, yudilevi, iritkatriel
2022-03-22 06:31:22gregory.p.smithsetmessageid: <>
2022-03-22 06:31:22gregory.p.smithlinkissue40172 messages
2022-03-22 06:31:22gregory.p.smithcreate