-
-
Notifications
You must be signed in to change notification settings - Fork 29.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each #69026
Comments
GNU and USTAR formats use a special case if the file path is longer than 100 bytes. The detection for this, though, incorrectly checked for 100 characters rather than 100 bytes. So, if the length was close to but not exceeding 100 characters and included special characters such that the encoded length is greater than 100 bytes, the encoded string was truncated to 100 bytes and thus the resulting file name was truncated within the tar file. For example... /gt-education/Colección Educativa Guatemala/thumbs/Libro de Texto Comunicacion y Lenguaje 1 Grado.jpg is truncated as: /gt-education/Colección Educativa Guatemala/thumbs/Libro de Texto Comunicacion y Lenguaje 1 Grado.jp The attached patch fixes this. Initially found on Python 3.3. Patch is tested on Linux with version 3.4.3-6 from Debian. Looking at the source code, I am pretty confident that the problem still exists upstream in Python 3.5. |
Thanks for the detailed report and the patch. I haven't checked yet, but I suppose that the entire 3.x branch is affected. The first thing I have to do now is to come up with a comprehensive testcase. |
New changeset d08d6b776694 by Lars Gustäbel in branch '3.5': New changeset e281a57d5b29 by Lars Gustäbel in branch 'default': |
Tests fail on FreeBSD: http://buildbot.python.org/all/builders/AMD64%20FreeBSD%209.x%203.5/builds/713/steps/test/logs/stdio Example: ====================================================================== Traceback (most recent call last):
File "/usr/home/buildbot/python/3.5.koobs-freebsd9/build/Lib/test/test_tarfile.py", line 1807, in test_unicode_link1
self._test_ustar_link("0123456789" * 9 + "01234567\xff")
File "/usr/home/buildbot/python/3.5.koobs-freebsd9/build/Lib/test/test_tarfile.py", line 1826, in _test_ustar_link
self.assertEqual(name, t.linkname)
AssertionError: '0123[44 chars]89012345678901234567890123456789012345678901234567\xff' != '0123[44 chars]89012345678901234567890123456789012345678901234567\udcc3\udcbf'
- 01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567\xff
? ^
+ 01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567\udcc3\udcbf
? ^^ |
New changeset 78ede2baa146 by Lars Gustäbel in branch '3.5': New changeset 08835d1e7a50 by Lars Gustäbel in branch 'default': |
Sorry for the glitch, I suppose everything works fine now. |
FYI the first release including the fix 78ede2baa146 is Python 3.5.2. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: