This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author sjt
Recipients serhiy.storchaka, sjt
Date 2016-09-11.19:21:51
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1473621712.47.0.891898261231.issue28080@psf.upfronthosting.co.za>
In-reply-to
Content
Suggested NEWS/whatsnew entry:

Add a new *memberNameEncoding* argument to the ZipFile constructor, allowing
:mod:`zipfile` to read filenames in non-conforming encodings from the
zipfile as Unicode.  This implementation assumes all member names have the same encoding.

Motivation:

There are applications in Japan that create zipfiles with directories containing filenames encoded in Shift JIS.  There may be such software in other countries as well.  As this is a violation of the Zip format definition, this library implements only an option to read such files.

Done:

(1) Add a memberNameEncoding argument to the main() function, which may be set from the command line with "--membernameencoding={codec}".  This command line option may be used with -e or -l, but not -c or -t.  There is no point to it in the latter, since the member names are not printed.
(2) Add a memberNameEncoding argument to the ZipFile constructor.  This is the only way to set it, so this is global to the ZipFile.
(3) Add this attribute to repr.
(4) Add a check that the mode is `read` in main() and in the ZipFile constructor, and if not invoke USAGE and exit or raise RuntimeError.
(5) When retrieving member names in constructing ZipInfo instances, check if memberNameEncoding is set, and if so use it, unless the UTF-8 bit is set. In that case, obey the UTF-8 bit, as the specified encoding is surely user error.
(6) Add a CODEC_USAGE message.
(7) Update the docs (docstrings, library reference, NEWS).
(8) Add tests:
    (a) List a zipfile's SJIS-encoded directory.
    (b) List a UTF-8-encoded directory and an ISO-8859-1-encoded directory as Shift-JIS.
    (c) Check that USAGE is invoked on attempts to write a zipfile in main().
    (d) Check that an appropriate error is raised on attempts to write in other functions.
    Many other tests are run as well.
    ALL TESTS PASS.
(9) Docs build without error.

To do (?):

(10) NEWS/whatsnew
(11) Check relevant code paths are all covered by tests.
(12) Review docs for clarity and organization.

Not done:

I don't think these are appropriate/needed at this time, but listed in case somebody thinks otherwise.

(13) Add a subtype of RuntimeError (see 7d)?
(14) Issue warning if both membernameencoding and utf-8 bit are set (see 4)?
(15) Support InfoZip encoding extension mentioned in APPNOTE.TXT - .ZIP File Format Specification, v6.3.4.
(16) Support per-member encodings (I think the zipfile standard permits, but not sure).
History
Date User Action Args
2016-09-11 19:21:53sjtsetrecipients: + sjt, serhiy.storchaka
2016-09-11 19:21:52sjtsetmessageid: <1473621712.47.0.891898261231.issue28080@psf.upfronthosting.co.za>
2016-09-11 19:21:52sjtlinkissue28080 messages
2016-09-11 19:21:52sjtcreate