This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: zipfile: cannot create zip file from files with non-utf8 filenames
Type: Stage:
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: joelpuig
Priority: normal Keywords:

Created on 2021-08-05 23:39 by joelpuig, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
mojibake.zip joelpuig, 2021-08-05 23:39 ZIP archive with a single file named "文字化け" (mojibake) encoded in Shift-JIS
Messages (1)
msg399049 - (view) Author: Joel Puig Rubio (joelpuig) Date: 2021-08-05 23:39
I'm attempting to make a script to create ZIP archives from files which filenames are encoded in Shift-JIS. However, Python seems to limit its filenames to ASCII or UTF-8, which means that attempting to archive said files will raise an exception. This is very inconvenient.

joel@bliss:~/test$ python3 -m zipfile -c mojibake.zip .
Traceback (most recent call last):
  File "/usr/lib/python3.8/zipfile.py", line 457, in _encodeFilenameFlags
    return self.filename.encode('ascii'), self.flag_bits
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.8/zipfile.py", line 2441, in <module>
    main()
  File "/usr/lib/python3.8/zipfile.py", line 2437, in main
    addToZip(zf, path, zippath)
  File "/usr/lib/python3.8/zipfile.py", line 2426, in addToZip
    addToZip(zf,
  File "/usr/lib/python3.8/zipfile.py", line 2421, in addToZip
    zf.write(path, zippath, ZIP_DEFLATED)
  File "/usr/lib/python3.8/zipfile.py", line 1775, in write
    with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
  File "/usr/lib/python3.8/zipfile.py", line 1517, in open
    return self._open_to_write(zinfo, force_zip64=force_zip64)
  File "/usr/lib/python3.8/zipfile.py", line 1614, in _open_to_write
    self.fp.write(zinfo.FileHeader(zip64))
  File "/usr/lib/python3.8/zipfile.py", line 447, in FileHeader
    filename, flag_bits = self._encodeFilenameFlags()
  File "/usr/lib/python3.8/zipfile.py", line 459, in _encodeFilenameFlags
    return self.filename.encode('utf-8'), self.flag_bits | 0x800
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-7: surrogates not allowed

The zip command from the Linux Info-ZIP package is able to create the same archive with no issues, which I've attached to this issue. Here you can see how the proper filenames are shown in WinRAR once the right encoding is selected: https://i.imgur.com/TVcI95A.png

The same should be seen on any computer using Shift-JIS as their locale.
History
Date User Action Args
2022-04-11 14:59:48adminsetgithub: 89009
2021-08-05 23:39:22joelpuigcreate