Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each #69026

Closed
RoddyShuler mannequin opened this issue Aug 10, 2015 · 7 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@RoddyShuler
Copy link
Mannequin

RoddyShuler mannequin commented Aug 10, 2015

BPO 24838
Nosy @gustaebel, @vstinner, @serhiy-storchaka
Files
  • fix-tarfile-path-truncation.patch: Patch to fix tarfile truncation with multi-byte characters
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gustaebel'
    closed_at = <Date 2016-04-19.10:02:47.483>
    created_at = <Date 2015-08-10.18:04:23.355>
    labels = ['type-bug', 'library']
    title = 'tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each'
    updated_at = <Date 2016-11-29.12:04:09.406>
    user = 'https://bugs.python.org/RoddyShuler'

    bugs.python.org fields:

    activity = <Date 2016-11-29.12:04:09.406>
    actor = 'vstinner'
    assignee = 'lars.gustaebel'
    closed = True
    closed_date = <Date 2016-04-19.10:02:47.483>
    closer = 'lars.gustaebel'
    components = ['Library (Lib)']
    creation = <Date 2015-08-10.18:04:23.355>
    creator = 'Roddy Shuler'
    dependencies = []
    files = ['40157']
    hgrepos = []
    issue_num = 24838
    keywords = ['patch']
    message_count = 7.0
    messages = ['248363', '248576', '263713', '263719', '263722', '263723', '281986']
    nosy_count = 5.0
    nosy_names = ['lars.gustaebel', 'vstinner', 'python-dev', 'serhiy.storchaka', 'Roddy Shuler']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue24838'
    versions = ['Python 3.5', 'Python 3.6']

    @RoddyShuler
    Copy link
    Mannequin Author

    RoddyShuler mannequin commented Aug 10, 2015

    GNU and USTAR formats use a special case if the file path is longer than 100 bytes. The detection for this, though, incorrectly checked for 100 characters rather than 100 bytes. So, if the length was close to but not exceeding 100 characters and included special characters such that the encoded length is greater than 100 bytes, the encoded string was truncated to 100 bytes and thus the resulting file name was truncated within the tar file.

    For example...

    /gt-education/Colección Educativa Guatemala/thumbs/Libro de Texto Comunicacion y Lenguaje 1 Grado.jpg

    is truncated as:

    /gt-education/Colección Educativa Guatemala/thumbs/Libro de Texto Comunicacion y Lenguaje 1 Grado.jp

    The attached patch fixes this. Initially found on Python 3.3. Patch is tested on Linux with version 3.4.3-6 from Debian. Looking at the source code, I am pretty confident that the problem still exists upstream in Python 3.5.

    @RoddyShuler RoddyShuler mannequin added the type-bug An unexpected behavior, bug, or error label Aug 10, 2015
    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Aug 14, 2015

    Thanks for the detailed report and the patch. I haven't checked yet, but I suppose that the entire 3.x branch is affected. The first thing I have to do now is to come up with a comprehensive testcase.

    @gustaebel gustaebel mannequin added the stdlib Python modules in the Lib dir label Aug 14, 2015
    @gustaebel gustaebel mannequin self-assigned this Aug 14, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 19, 2016

    New changeset d08d6b776694 by Lars Gustäbel in branch '3.5':
    Issue bpo-24838: tarfile's ustar and gnu formats now correctly calculate name and
    https://hg.python.org/cpython/rev/d08d6b776694

    New changeset e281a57d5b29 by Lars Gustäbel in branch 'default':
    Issue bpo-24838: Merge tarfile fix from 3.5.
    https://hg.python.org/cpython/rev/e281a57d5b29

    @gustaebel gustaebel mannequin closed this as completed Apr 19, 2016
    @vstinner
    Copy link
    Member

    Tests fail on FreeBSD:

    http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.5/builds/713/steps/test/logs/stdio

    Example:

    ======================================================================
    FAIL: test_unicode_link1 (test.test_tarfile.UstarUnicodeTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/usr/home/buildbot/python/3.5.koobs-freebsd9/build/Lib/test/test_tarfile.py", line 1807, in test_unicode_link1
        self._test_ustar_link("0123456789" * 9 + "01234567\xff")
      File "/usr/home/buildbot/python/3.5.koobs-freebsd9/build/Lib/test/test_tarfile.py", line 1826, in _test_ustar_link
        self.assertEqual(name, t.linkname)
    AssertionError: '0123[44 chars]89012345678901234567890123456789012345678901234567\xff' != '0123[44 chars]89012345678901234567890123456789012345678901234567\udcc3\udcbf'
    - 01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567\xff
    ?                                                                                                   ^
    + 01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567\udcc3\udcbf
    ?                                                                                                   ^^

    @vstinner vstinner reopened this Apr 19, 2016
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 19, 2016

    New changeset 78ede2baa146 by Lars Gustäbel in branch '3.5':
    Issue bpo-24838: Fix test_tarfile.py for non-utf8 filesystem encodings.
    https://hg.python.org/cpython/rev/78ede2baa146

    New changeset 08835d1e7a50 by Lars Gustäbel in branch 'default':
    Issue bpo-24838: Merge test_tarfile.py fix from 3.5.
    https://hg.python.org/cpython/rev/08835d1e7a50

    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Apr 19, 2016

    Sorry for the glitch, I suppose everything works fine now.

    @gustaebel gustaebel mannequin closed this as completed Apr 19, 2016
    @vstinner
    Copy link
    Member

    FYI the first release including the fix 78ede2baa146 is Python 3.5.2.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant