Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipfile can't decrypt #49121

Closed
gladed mannequin opened this issue Jan 7, 2009 · 15 comments
Closed

zipfile can't decrypt #49121

gladed mannequin opened this issue Jan 7, 2009 · 15 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@gladed
Copy link
Mannequin

gladed mannequin commented Jan 7, 2009

BPO 4871
Nosy @amauryfa, @vstinner, @bitdancer
Files
  • zipfile-pwd.patch
  • zipfile_no_unicode_pasword.patch
  • zipfile_no_unicode_pasword-2.patch
  • zipfile_no_unicode_pasword-2-py3k.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/bitdancer'
    closed_at = <Date 2010-12-21.22:03:13.956>
    created_at = <Date 2009-01-07.20:26:24.987>
    labels = ['type-bug', 'library']
    title = "zipfile can't decrypt"
    updated_at = <Date 2010-12-21.22:03:13.955>
    user = 'https://bugs.python.org/gladed'

    bugs.python.org fields:

    activity = <Date 2010-12-21.22:03:13.955>
    actor = 'r.david.murray'
    assignee = 'r.david.murray'
    closed = True
    closed_date = <Date 2010-12-21.22:03:13.956>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2009-01-07.20:26:24.987>
    creator = 'gladed'
    dependencies = []
    files = ['12712', '12723', '12728', '12729']
    hgrepos = []
    issue_num = 4871
    keywords = ['patch']
    message_count = 15.0
    messages = ['79368', '79388', '79421', '79423', '79425', '79426', '79719', '79724', '79771', '79784', '79791', '79793', '124322', '124460', '124463']
    nosy_count = 6.0
    nosy_names = ['amaury.forgeotdarc', 'ggenellina', 'vstinner', 'r.david.murray', 'ebfe', 'gladed']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue4871'
    versions = ['Python 3.1', 'Python 3.2']

    @gladed
    Copy link
    Mannequin Author

    gladed mannequin commented Jan 7, 2009

    If a password is supplied using zpifile.read(objName, password), a
    TypeError occurs:

    File "C:\Program Files\Python30\lib\zipfile.py", line 420, in __init__
    self._UpdateKeys(p)
    File "C:\Program Files\Python30\lib\zipfile.py", line 424, in
    _UpdateKeys
    self.key0 = self._crc32(c, self.key0)
    File "C:\Program Files\Python30\lib\zipfile.py", line 413, in _crc32
    return ((crc >> 8) & 0xffffff) ^ self.crctable[(crc ^ ch) & 0xff]
    TypeError: unsupported operand type(s) for ^: 'int' and 'str'

    This is resolved in zipfile.py by replacing this line in __init__:
    self._UpdateKeys(ord(p))

    @gladed gladed mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jan 7, 2009
    @vstinner
    Copy link
    Member

    vstinner commented Jan 8, 2009

    The key is a bytes string or an unicode string? Using bytes, each
    element is an int instead of a str.

    @gladed
    Copy link
    Mannequin Author

    gladed mannequin commented Jan 8, 2009

    In this case under test, the password string had been clipped from a
    filename, so I suppose it would have been Unicode.

    @ebfe
    Copy link
    Mannequin

    ebfe mannequin commented Jan 8, 2009

    This is basically the same problem as with other bytes-orientated modules.

    The choice is either to reject unicode and force the caller to use
    .encode() on all his strings (so 'password' is an instance of bytes and
    'ch' an instance of int). I'd prefer that.

    Or to check if 'password' is a unicode-object and encode to default
    within zipfile.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 8, 2009

    check if 'password' is a unicode-object
    and encode to default within zipfile.

    Hmmm... what is the default charset? :-) I think that the charset
    depends on the OS and the user locale...

    @ebfe
    Copy link
    Mannequin

    ebfe mannequin commented Jan 8, 2009

    The default encoding is UTF8. Even without that there should be no
    problems while staying in the same environment as the
    character->translation is the same.

    However it violates the rule of least surprise to see zipfile throwing
    CRC errors because the password is something like 'u?è´´n' and encoded
    by some locale-changing-nitwit in korea...

    @ggenellina
    Copy link
    Mannequin

    ggenellina mannequin commented Jan 13, 2009

    Lukas Lueg> The default encoding is UTF8

    What do you mean? Not inside a zip file. The default encoding is CP437
    (the original IBM PC US character set).

    A zipfile password is a sequence of bytes, not characters, as defined
    in the specification. Probably this issue has to wait until it is
    decided what to do in the general case; anyway, encoding the password
    to bytes at the application level seems to be safer.

    @amauryfa
    Copy link
    Member

    in Lib/zipfile.py, the filename is already allowed to be unicode if the
    "flag_bits" for the file entry contains 0x800. This indicates that the
    encoding is utf-8.

    We could do the same for the password. Attached is a tentative patch
    along this idea (for py3k replace 'unicode' with 'str' of course)

    @vstinner
    Copy link
    Member

    I created small archive with a password using zip 2.32 (Ubuntu Gutsy).
    The unicode flag is not set and so ASCII charset is used to encode the
    password. But my password was an unicode password using non-ASCII
    characters and so I get an UnicodeEncodeError ('ascii' codec can't
    encode character u'\xe9' ...). The user (me) doesn't expect such error
    here :-/

    If I encode the unicode password to bytes using
    password.encode('utf-8'), it works as expected. So I would prefer to
    just reject unicode for the password. Attached patch implements that.

    @ggenellina
    Copy link
    Mannequin

    ggenellina mannequin commented Jan 13, 2009

    Yes, the unicode flag is irrelevant to the password. To successfuly
    decrypt a file, one must know the password *and* the encoding in use
    when it was written, so the only sensible thing to do is to accept
    bytes only.

    Your patch looks good to me with a few comments:

    • I'd only check the password in setpassword() and open(); the other
      methods eventually call open().

    • The issue version says "3.0"; but it should be applied to 2.6 2.7
      and 3.1 too.

    • For 3.0 the check is wrong, should say "bytes" instead of "str".
      Even in 2.6 should use "bytes" to make clear the intent.

    • The usual wording for such TypeError is more like "pwd:
      expected 'bytes', got '%s'"

    @vstinner
    Copy link
    Member

    gagenellina: Oops, yes, I wrote my patch for Python trunk. New patch
    only uses two checks for the password type (since extract() calls
    open()). I also changed the error message.

    @vstinner
    Copy link
    Member

    Patch for py3k: str=>bytes, remove "u" string prefix and
    rename "bytes" variable to "data" to avoid conflict with the bytes
    type.

    @bitdancer
    Copy link
    Member

    What about bytearray? Apparently that works pre-patch for at least read, though setpassword rejects it via an assertion.

    Also, the error message should be "expected bytes" rather than "bytes expected". Don't ask me why, that's just the way it is normally done, so the other order sounds weird to this English speaker's ear. Otherwise the py3k patch looks good and tests correctly for me.

    @bitdancer
    Copy link
    Member

    Thinking about this some more, it seems like the chance that someone is using bytearray to pass a password to zipfile is vanishingly small, especially since in non-optimized mode setpassword would have rejected it. So I think that this should go in.

    @bitdancer bitdancer self-assigned this Dec 21, 2010
    @bitdancer
    Copy link
    Member

    Committed in r87430 (with message word order change), backported to 3.1 in r87431. Making the parallel change to 2.7 would be likely to break working code, IMO.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants