classification
Title: zipfile can't decrypt
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: amaury.forgeotdarc, ebfe, ggenellina, gladed, r.david.murray, vstinner
Priority: normal Keywords: patch

Created on 2009-01-07 20:26 by gladed, last changed 2010-12-21 22:03 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
zipfile-pwd.patch amaury.forgeotdarc, 2009-01-13 08:40 review
zipfile_no_unicode_pasword.patch vstinner, 2009-01-13 19:29 review
zipfile_no_unicode_pasword-2.patch vstinner, 2009-01-13 23:22 review
zipfile_no_unicode_pasword-2-py3k.patch vstinner, 2009-01-13 23:23 review
Messages (15)
msg79368 - (view) Author: Glade Diviney (gladed) Date: 2009-01-07 20:26
If a password is supplied using zpifile.read(objName, password), a 
TypeError occurs:

  File "C:\Program Files\Python30\lib\zipfile.py", line 420, in __init__
    self._UpdateKeys(p)
  File "C:\Program Files\Python30\lib\zipfile.py", line 424, in 
_UpdateKeys
    self.key0 = self._crc32(c, self.key0)
  File "C:\Program Files\Python30\lib\zipfile.py", line 413, in _crc32
    return ((crc >> 8) & 0xffffff) ^ self.crctable[(crc ^ ch) & 0xff]
TypeError: unsupported operand type(s) for ^: 'int' and 'str'

This is resolved in zipfile.py by replacing this line in __init__:
             self._UpdateKeys(ord(p))
msg79388 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-08 01:03
The key is a bytes string or an unicode string? Using bytes, each 
element is an int instead of a str.
msg79421 - (view) Author: Glade Diviney (gladed) Date: 2009-01-08 16:19
In this case under test, the password string had been clipped from a 
filename, so I suppose it would have been Unicode.
msg79423 - (view) Author: Lukas Lueg (ebfe) Date: 2009-01-08 16:41
This is basically the same problem as with other bytes-orientated modules.

The choice is either to reject unicode and force the caller to use
.encode() on all his strings (so 'password' is an instance of bytes and
'ch' an instance of int). I'd prefer that.

Or to check if 'password' is a unicode-object and encode to default
within zipfile.
msg79425 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-08 16:50
> check if 'password' is a unicode-object 
> and encode to default within zipfile. 

Hmmm... what is the default charset? :-) I think that the charset 
depends on the OS and the user locale...
msg79426 - (view) Author: Lukas Lueg (ebfe) Date: 2009-01-08 16:59
The default encoding is UTF8. Even without that there should be no
problems while staying in the same environment as the
character->translation is the same.

However it violates the rule of least surprise to see zipfile throwing
CRC errors because the password is something like 'u?è´´n' and encoded
by some locale-changing-nitwit in korea...
msg79719 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009-01-13 05:34
Lukas Lueg> The default encoding is UTF8

What do you mean? Not inside a zip file. The default encoding is CP437 
(the original IBM PC US character set).

A zipfile password is a sequence of bytes, not characters, as defined 
in the specification. Probably this issue has to wait until it is 
decided what to do in the general case; anyway, encoding the password 
to bytes at the application level seems to be safer.
msg79724 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-01-13 08:40
in Lib/zipfile.py, the filename is already allowed to be unicode if the
"flag_bits" for the file entry contains 0x800. This indicates that the
encoding is utf-8.

We could do the same for the password. Attached is a tentative patch
along this idea (for py3k replace 'unicode' with 'str' of course)
msg79771 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-13 19:25
I created small archive with a password using zip 2.32 (Ubuntu Gutsy). 
The unicode flag is not set and so ASCII charset is used to encode the 
password. But my password was an unicode password using non-ASCII 
characters and so I get an UnicodeEncodeError ('ascii' codec can't 
encode character u'\xe9' ...). The user (me) doesn't expect such error 
here :-/

If I encode the unicode password to bytes using 
password.encode('utf-8'), it works as expected. So I would prefer to 
just reject unicode for the password. Attached patch implements that.
msg79784 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009-01-13 22:45
Yes, the unicode flag is irrelevant to the password. To successfuly 
decrypt a file, one must know the password *and* the encoding in use 
when it was written, so the only sensible thing to do is to accept 
bytes only.

Your patch looks good to me with a few comments:

- I'd only check the password in setpassword() and open(); the other 
methods eventually call open().

- The issue version says "3.0"; but it should be applied to 2.6 2.7 
and 3.1 too.

- For 3.0 the check is wrong, should say "bytes" instead of "str". 
Even in 2.6 should use "bytes" to make clear the intent.

- The usual wording for such TypeError is more like "pwd: 
expected 'bytes', got '%s'"
msg79791 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-13 23:22
gagenellina: Oops, yes, I wrote my patch for Python trunk. New patch 
only uses two checks for the password type (since extract() calls 
open()). I also changed the error message.
msg79793 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-13 23:23
Patch for py3k: str=>bytes, remove "u" string prefix and 
rename "bytes" variable to "data" to avoid conflict with the bytes 
type.
msg124322 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-18 21:43
What about bytearray?  Apparently that works pre-patch for at least read, though setpassword rejects it via an assertion.

Also, the error message should be "expected bytes" rather than "bytes expected".  Don't ask me why, that's just the way it is normally done, so the other order sounds weird to this English speaker's ear.  Otherwise the py3k patch looks good and tests correctly for me.
msg124460 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-21 21:36
Thinking about this some more, it seems like the chance that someone is using bytearray to pass a password to zipfile is vanishingly small, especially since in non-optimized mode setpassword would have rejected it.  So I think that this should go in.
msg124463 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-21 22:03
Committed in r87430 (with message word order change), backported to 3.1 in r87431.  Making the parallel change to 2.7 would be likely to break working code, IMO.
History
Date User Action Args
2010-12-21 22:03:13r.david.murraysetstatus: open -> closed
versions: - Python 2.7
nosy: amaury.forgeotdarc, ggenellina, vstinner, r.david.murray, ebfe, gladed
messages: + msg124463

resolution: fixed
stage: commit review -> resolved
2010-12-21 21:36:15r.david.murraysetassignee: r.david.murray
messages: + msg124460
nosy: amaury.forgeotdarc, ggenellina, vstinner, r.david.murray, ebfe, gladed
stage: commit review
2010-12-18 21:43:24r.david.murraysetnosy: + r.david.murray

messages: + msg124322
versions: + Python 3.2, - Python 2.6, Python 3.0
2009-01-13 23:23:26vstinnersetfiles: + zipfile_no_unicode_pasword-2-py3k.patch
messages: + msg79793
2009-01-13 23:22:20vstinnersetfiles: + zipfile_no_unicode_pasword-2.patch
messages: + msg79791
versions: + Python 2.6, Python 3.1, Python 2.7
2009-01-13 22:46:00ggenellinasetmessages: + msg79784
2009-01-13 19:29:34vstinnersetfiles: + zipfile_no_unicode_pasword.patch
2009-01-13 19:29:20vstinnersetfiles: - zipfile_no_unicode_pasword.patch
2009-01-13 19:25:04vstinnersetfiles: + zipfile_no_unicode_pasword.patch
messages: + msg79771
2009-01-13 08:40:49amaury.forgeotdarcsetfiles: + zipfile-pwd.patch
keywords: + patch
messages: + msg79724
nosy: + amaury.forgeotdarc
2009-01-13 05:34:43ggenellinasetnosy: + ggenellina
messages: + msg79719
2009-01-08 16:59:35ebfesetmessages: + msg79426
2009-01-08 16:50:48vstinnersetmessages: + msg79425
2009-01-08 16:41:01ebfesetnosy: + ebfe
messages: + msg79423
2009-01-08 16:19:25gladedsetmessages: + msg79421
2009-01-08 01:03:41vstinnersetnosy: + vstinner
messages: + msg79388
2009-01-07 20:26:25gladedcreate