Message 407765 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dhillier
Recipients	accelerator0099, dhillier, eric.smith
Date	2021-12-06.01:27:54
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1638754074.77.0.405177301479.issue45981@roundup.psfhosted.org>
In-reply-to

Content
Handling different character sets is not completely supported yet. There are a couple of open issues relating to this: https://bugs.python.org/issue40407 (reading file names), https://bugs.python.org/issue41928 (support for reading and writing filenames using the unicode filename extra field) and https://bugs.python.org/issue40172 (issues with reading and then writing a filename from and back into a zip where the initial filename isn't encoded in cp437). Most modern zip programs that deal with characters outside ascii or cp437 either set the utf-8 flag or write both an ascii or cp437 compatible filename (to the original filename field in the zip header) and the actual filename with all non-ascii characters in the unicode filename extra field. I think adding support for the unicode field to Python would probably cover the majority files generated by modern zip programs. For complete support, including older zip programs that don't support the utf-8 flag or unicode filename extra field, we may need to provide another parameter in Python's ZipFile's read and write functions to be able to override the charset used for the filename stored directly in the zip file header. I've added my thoughts on how to approach this in https://bugs.python.org/issue40172 but haven't had time to implement these myself.

Handling different character sets is not completely supported yet. There are a couple of open issues relating to this: https://bugs.python.org/issue40407 (reading file names), https://bugs.python.org/issue41928 (support for reading and writing filenames using the unicode filename extra field) and https://bugs.python.org/issue40172 (issues with reading and then writing a filename from and back into a zip where the initial filename isn't encoded in cp437).

Most modern zip programs that deal with characters outside ascii or cp437 either set the utf-8 flag or write both an ascii or cp437 compatible filename (to the original filename field in the zip header) and the actual filename with all non-ascii characters in the unicode filename extra field. I think adding support for the unicode field to Python would probably cover the majority files generated by modern zip programs.

For complete support, including older zip programs that don't support the utf-8 flag or unicode filename extra field, we may need to provide another parameter in Python's ZipFile's read and write functions to be able to override the charset used for the filename stored directly in the zip file header.

I've added my thoughts on how to approach this in https://bugs.python.org/issue40172 but haven't had time to implement these myself.

History
Date	User	Action	Args
2021-12-06 01:27:55	dhillier	set	recipients: + dhillier, eric.smith, accelerator0099
2021-12-06 01:27:54	dhillier	set	messageid: <1638754074.77.0.405177301479.issue45981@roundup.psfhosted.org>
2021-12-06 01:27:54	dhillier	link	issue45981 messages
2021-12-06 01:27:54	dhillier	create