Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3, ZipFile Bug In Chinese #56257

Closed
yaoyu mannequin opened this issue May 10, 2011 · 6 comments
Closed

Python 3, ZipFile Bug In Chinese #56257

yaoyu mannequin opened this issue May 10, 2011 · 6 comments
Labels
stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@yaoyu
Copy link
Mannequin

yaoyu mannequin commented May 10, 2011

BPO 12048
Nosy @birkenfeld, @amauryfa, @vstinner
Files
  • test.zip
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-05-18.12:01:36.566>
    created_at = <Date 2011-05-10.07:59:56.416>
    labels = ['type-bug', 'library', 'expert-unicode']
    title = 'Python 3, ZipFile Bug In Chinese'
    updated_at = <Date 2011-05-18.12:01:36.565>
    user = 'https://bugs.python.org/yaoyu'

    bugs.python.org fields:

    activity = <Date 2011-05-18.12:01:36.565>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-05-18.12:01:36.566>
    closer = 'vstinner'
    components = ['Library (Lib)', 'Unicode']
    creation = <Date 2011-05-10.07:59:56.416>
    creator = 'yaoyu'
    dependencies = []
    files = ['21952']
    hgrepos = []
    issue_num = 12048
    keywords = []
    message_count = 6.0
    messages = ['135687', '135837', '135840', '135842', '136226', '136232']
    nosy_count = 4.0
    nosy_names = ['georg.brandl', 'amaury.forgeotdarc', 'vstinner', 'yaoyu']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue12048'
    versions = ['Python 3.1']

    @yaoyu
    Copy link
    Mannequin Author

    yaoyu mannequin commented May 10, 2011

    Python 3, ZipFile Bug In Chinese:
    1. In Python3.1.3 can't extract "复件 test.txt" from test.zip
    ╕┤╝■ test.txt
    Traceback (most recent call last):
      File "C:\Temp\PythonZipTest\pythonzip.py", line 14, in <module>
        main()
      File "C:\Temp\PythonZipTest\pythonzip.py", line 11, in main
        z.extract(z.namelist()[0])
      File "c:\python31\lib\zipfile.py", line 980, in extract
        return self._extract_member(member, path, pwd)
      File "c:\python31\lib\zipfile.py", line 1023, in _extract_member
        source = self.open(member, pwd=pwd)
      File "c:\python31\lib\zipfile.py", line 928, in open
        % (zinfo.orig_filename, fname))
    zipfile.BadZipfile: File name in directory '╕┤╝■ test.txt' and header b'\xb8\xb4\xbc\xfe test.txt' differ.
    1. In Python3.2 extract "复件 test.txt" from test.zip uncorrect
      It extract the file as "╕┤╝■ test.txt"

    2. In Python 2.7.1, It's OK!

              2011-05-10
    Source Code
    ######################################################################
    #coding=gbk

    import zipfile
    import os
    
    def main():
      szTestDir = os.path.dirname(__file__)
      szFile = os.path.join(szTestDir, 'test.zip')
      z = zipfile.ZipFile(szFile)
      print(z.namelist()[0])
      z.extract(z.namelist()[0])
    
    if __name__ == '__main__':
      main()

    @yaoyu yaoyu mannequin added the type-bug An unexpected behavior, bug, or error label May 10, 2011
    @vstinner
    Copy link
    Member

    This is a duplicate of bpo-10801, issue fixed in Python 3.2 or later by 33543b4e0e5d. Should we backport the fix to Python 3.1, or you can upgrade to Python 3.2?

    Output with Python 3.2: "╕┤╝■ test.txt".

    @vstinner vstinner added stdlib Python modules in the Lib dir topic-unicode labels May 12, 2011
    @amauryfa
    Copy link
    Member

    But according to the initial report, 3.2 does not give the expected behavior. This zip file actually stores the filename encoded with cp932, which is incorrect according to the specifications of the ZIP format (only cp437 and utf8 are valid)

    See bpo-10614 for a possible solution: allow users to specify an alternate encoding to handle such invalid files.

    @vstinner
    Copy link
    Member

    Oh, right.

    Note: the encoding looks to be GBK, not CP932:

    >>> '\u590d\u4ef6'.encode('gbk')
    b'\xb8\xb4\xbc\xfe'
    >>> '\u590d\u4ef6'.encode('gbk').decode('cp437')
    '╕┤╝■'
    >>> '\u590d\u4ef6'.encode('cp932')
    ...
    UnicodeEncodeError: 'cp932' codec can't encode character '\u590d' ...

    @vstinner
    Copy link
    Member

    See also bpo-4621.

    @vstinner
    Copy link
    Member

    This issue is just another example of the issue bpo-10614: I'm closing it as a duplicate.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants