classification
Title: zipfile.ZipFile.write() does not accept bytes arcname
Type: behavior Stage: needs patch
Components: Documentation, Library (Lib) Versions: Python 3.4, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Patrik Dufresne, docs@python, iritkatriel, july, matrixise, r.david.murray, serhiy.storchaka
Priority: normal Keywords:

Created on 2015-05-01 21:30 by july, last changed 2020-12-08 19:35 by iritkatriel.

Messages (9)
msg242355 - (view) Author: July Tikhonov (july) * Date: 2015-05-01 21:30
In documentation of zipfile.ZipFile.write() there is following notice:

"There is no official file name encoding for ZIP files. If you have unicode file names, you must convert them to byte strings in your desired encoding before passing them to write()."

I understand it as that 'arcname' argument to write() shouldn't be of type str, but rather bytes.

But it is str that works, and bytes that does not:

$ ./python
Python 3.5.0a4+ (default:6f6e78931875, May  1 2015, 23:18:40) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import zipfile
>>> zf = zipfile.ZipFile('foo.zip', 'w')
>>> zf.write('python', 'a')
>>> zf.write('python', b'b')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/july/source/python/Lib/zipfile.py", line 1442, in write
    zinfo = ZipInfo(arcname, date_time)
  File "/home/july/source/python/Lib/zipfile.py", line 322, in __init__
    null_byte = filename.find(chr(0))
TypeError: a bytes-like object is required, not 'str'

(ZipInfo ostensibly attempts to find a zero byte in the filename, but searches instead for a unicode character chr(0). There are several other places in ZipInfo class that assume filename being str rather than bytes.)

I consider this a documentation issue: the notice is misleading. Although maybe there is someone who wants to fix the behavior of ZipInfo to allow bytes filename.
msg242356 - (view) Author: St├ęphane Wirtel (matrixise) * (Python committer) Date: 2015-05-01 21:41
This documentation is correct for python2 but maybe not for python3.

To check.
msg242373 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-02 02:36
We should either make it work with byte filenames, or allow control of the filename encoding.  See also issue 20329.  Unfortunately that part is probably a new feature.  In the meantime the docs should be fixed: I believe we automatically encode the filename using the default zip filename codec (but someone should check).
msg242374 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-02 04:30
Indeed, the note is outdated and incorrect. First, general unicode filename are allowed. They are encoded with UTF-8 internally. Second, currently there is no way to create an entry without encoding the filename to UTF-8 (if it is not ASCII-only). So you can't create ZIP file with arbitrary encoding (e.g. cp866) for old DOS/Windows unzippers.

Adding support of bytes filenames is different issue (issue10757).
msg242392 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-02 12:31
Ah, I *thought* there was an issue for that, but I didn't find it when I searched.  So this is just a doc issue to fix the docs to reflect current reality.
msg257358 - (view) Author: Patrik Dufresne (Patrik Dufresne) Date: 2016-01-02 20:12
I'm converting my project into python3. I'm encountering issue with zipfile encoding. Look like, it only support unicode path. This is a huge issue since path are, by definition, bytes. You may store a file name with an invalid character without issue on the filesystem.

As such, arcname should support bytes.

Like, Tar, zip file format doesn't define a specific encoding. You may store filename as bytes.
msg257359 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-01-02 20:20
As noted, adding that support is the subject of issue 10757.
msg259273 - (view) Author: Patrik Dufresne (Patrik Dufresne) Date: 2016-01-31 02:07
Manage to work around this issue by using surrogateescape for arcname and filename. For me it's no longer an issue.
msg382761 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2020-12-08 19:35
That part of the documentation was updated here by Serhiy: https://github.com/python/cpython/pull/10592
History
Date User Action Args
2020-12-08 19:35:55iritkatrielsetnosy: + iritkatriel
messages: + msg382761
2016-01-31 02:07:41Patrik Dufresnesetmessages: + msg259273
2016-01-02 20:20:18r.david.murraysetmessages: + msg257359
2016-01-02 20:12:28Patrik Dufresnesetnosy: + Patrik Dufresne
messages: + msg257358
2015-05-02 12:31:29r.david.murraysetmessages: + msg242392
2015-05-02 04:30:28serhiy.storchakasetversions: - Python 3.6
nosy: + serhiy.storchaka

messages: + msg242374

stage: needs patch
2015-05-02 02:36:20r.david.murraysetnosy: + r.david.murray
messages: + msg242373
2015-05-01 21:41:52matrixisesetnosy: + matrixise
messages: + msg242356
2015-05-01 21:31:23julysetcomponents: + Library (Lib)
2015-05-01 21:30:03julycreate