Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch for bug 1170311 "zipfile UnicodeDecodeError" #45077

Closed
snaury mannequin opened this issue Jun 10, 2007 · 10 comments
Closed

patch for bug 1170311 "zipfile UnicodeDecodeError" #45077

snaury mannequin opened this issue Jun 10, 2007 · 10 comments
Assignees
Labels
stdlib Python modules in the Lib dir

Comments

@snaury
Copy link
Mannequin

snaury mannequin commented Jun 10, 2007

BPO 1734346
Nosy @loewis
Files
  • python-zipfile-unicode-filenames.patch: Patch and test case
  • python-zipfile-unicode-filenames-utf8.patch: Patch that sets language bit for unicode filenames
  • python-zipfile-unicode-filenames-utf8-2.patch: Patch falls back to ascii when it can, ZipInfo filenames are not damaged after writing
  • python-zipfile-unicode-filenames-utf8-3.patch: Forgot to add test case in the previous patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/loewis'
    closed_at = <Date 2008-05-05.17:18:55.519>
    created_at = <Date 2007-06-10.10:53:22.000>
    labels = ['library']
    title = 'patch for bug 1170311 "zipfile UnicodeDecodeError"'
    updated_at = <Date 2008-05-05.21:16:03.646>
    user = 'https://bugs.python.org/snaury'

    bugs.python.org fields:

    activity = <Date 2008-05-05.21:16:03.646>
    actor = 'loewis'
    assignee = 'loewis'
    closed = True
    closed_date = <Date 2008-05-05.17:18:55.519>
    closer = 'loewis'
    components = ['Library (Lib)']
    creation = <Date 2007-06-10.10:53:22.000>
    creator = 'snaury'
    dependencies = []
    files = ['8043', '8044', '8045', '8046']
    hgrepos = []
    issue_num = 1734346
    keywords = ['patch']
    message_count = 10.0
    messages = ['52744', '52745', '52746', '52747', '52748', '65935', '65939', '66274', '66277', '66289']
    nosy_count = 3.0
    nosy_names = ['loewis', 'snaury', 'kalt']
    pr_nums = []
    priority = 'high'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1734346'
    versions = []

    @snaury
    Copy link
    Mannequin Author

    snaury mannequin commented Jun 10, 2007

    This patch fixes UnicodeDecodeError when attempting to write files to zipfile with filename of unicode class.

    @snaury snaury mannequin added stdlib Python modules in the Lib dir labels Jun 10, 2007
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jun 10, 2007

    This patch is incorrect. It relies on the system encoding, and allows non-string things as file names. What it really should do is to encode in code page 437; bonus points if it falls back to the UTF-8 feature of zip files when that encoding fails.

    @snaury
    Copy link
    Mannequin Author

    snaury mannequin commented Jun 10, 2007

    File Added: python-zipfile-unicode-filenames-utf8.patch

    @snaury
    Copy link
    Mannequin Author

    snaury mannequin commented Jun 11, 2007

    File Added: python-zipfile-unicode-filenames-utf8-2.patch

    @snaury
    Copy link
    Mannequin Author

    snaury mannequin commented Jun 11, 2007

    File Added: python-zipfile-unicode-filenames-utf8-3.patch

    @loewis loewis mannequin assigned loewis Sep 10, 2007
    @kalt
    Copy link
    Mannequin

    kalt mannequin commented Apr 28, 2008

    Any chance of this making it in sometime?
    The current behaviour is rather limiting/annoying.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 28, 2008

    Any chance of this making it in sometime?

    I'll see what I can do for 2.6, but perhaps it gets delayed until
    2.7/3.1.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 5, 2008

    Thanks for the patch, committed as r62724. I didn't see the need to
    clear the UTF-8 flag, so I left it in (in case somebody wants to inspect
    it).

    @loewis loewis mannequin closed this as completed May 5, 2008
    @loewis loewis mannequin closed this as completed May 5, 2008
    @snaury
    Copy link
    Mannequin Author

    snaury mannequin commented May 5, 2008

    Martin, I cleared the flag bit because filename was changed in-place, to
    mark that filename does not need further processing. This was primarily
    compatibility concern, to accommodate for situations where users try to
    do such decoding in their own code (this way flag won't be there, so
    their code won't trigger). Without clearing the flag bit, calling
    _decodeFilenameFlags second time will fail, as well as any similar user
    code.

    I suggest that if users want to know if filename is unicode, they should
    check that filename is of class unicode.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 5, 2008

    Martin, I cleared the flag bit because filename was changed in-place, to
    mark that filename does not need further processing. This was primarily
    compatibility concern, to accommodate for situations where users try to
    do such decoding in their own code (this way flag won't be there, so
    their code won't trigger). Without clearing the flag bit, calling
    _decodeFilenameFlags second time will fail, as well as any similar user
    code.

    I'm not concerned about the compatibility; code that actually does the
    decoding still might break since it would expect the filename to be a
    byte string if it doesn't explicitly decode. Such assumption would still
    break under your change.

    I am concerned about silently faking data. The library shouldn't do
    that; it should present the flags unmodified, as some application might
    perform further processing (such as displaying the flags to the user).
    It would then be confusing if the data processed isn't the one that was
    read from disk.

    I suggest that if users want to know if filename is unicode, they should
    check that filename is of class unicode.

    That won't work in Py3k, which will always decode the filename.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    0 participants