Message 377945 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ivan.sorokin.tech
Recipients	ivan.sorokin.tech
Date	2020-10-04.15:24:54
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1601825094.51.0.960614764386.issue41928@roundup.psfhosted.org>
In-reply-to

Content
Grand unified algorithm to read filenames from zip files correctly: 1. Do zip entry have «Unicode Path Extra Field» (0x7075)? Use it for file name. 2. Is Unicode flag (0x800) set in «Flags» Field of zip entry? Assume «Filename» Field is in UTF-8. 3. Do «HostOS» Field of zip entry have values of 0 (FAT) or 11 (NTFS)? Assume «Filename» Field is in OEM charset corresponding to system locale. 4. Assume «Filename» Field is in UTF-8. p7zip with oemcp patch (https://github.com/unxed/oemcp/) uses exactly this method, and is able to process all zip files in my test set correctly (my test set contains several zips generated by different packers on windows, macos, linux, and by online services). The same algorithm should be used in any zip unpacker wishing to process non-latin filenames as gently as possible.

Grand unified algorithm to read filenames from zip files correctly:

1. Do zip entry have «Unicode Path Extra Field» (0x7075)? Use it for file name.
2. Is Unicode flag (0x800) set in «Flags» Field of zip entry? Assume «Filename» Field is in UTF-8.
3. Do «HostOS» Field of zip entry have values of 0 (FAT) or 11 (NTFS)? Assume «Filename» Field is in OEM charset corresponding to system locale.
4. Assume «Filename» Field is in UTF-8.

p7zip with oemcp patch (https://github.com/unxed/oemcp/) uses exactly this method, and is able to process all zip files in my test set correctly (my test set contains several zips generated by different packers on windows, macos, linux, and by online services). The same algorithm should be used in any zip unpacker wishing to process non-latin filenames as gently as possible.

History
Date	User	Action	Args
2020-10-04 15:24:54	ivan.sorokin.tech	set	recipients: + ivan.sorokin.tech
2020-10-04 15:24:54	ivan.sorokin.tech	set	messageid: <1601825094.51.0.960614764386.issue41928@roundup.psfhosted.org>
2020-10-04 15:24:54	ivan.sorokin.tech	link	issue41928 messages
2020-10-04 15:24:54	ivan.sorokin.tech	create