Issue29760
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2017-03-08 19:32 by posita, last changed 2022-04-11 14:58 by admin.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
tarfail.tar.bz2 | posita, 2017-03-08 19:32 | test case with data files | ||
tarfile.patch | posita, 2017-03-10 23:00 | possible fix |
Messages (5) | |||
---|---|---|---|
msg289253 - (view) | Author: Matt B (posita) * | Date: 2017-03-08 19:32 | |
It looks like there's a problem examining ``.tar`` files with no entries: ``` $ # ================================================================== $ # Extract test cases (attached to this bug report) $ tar xpvf tarfail.tar.bz2 x tarfail/ x tarfail/tarfail.py x tarfail/test.tar x tarfail/test.tar.bz2 $ cd tarfail $ # ================================================================== $ # Note that test.tar.bz2 is just test.tar, but bzip2'ed: $ bzip2 -c test.tar | openssl dgst -sha256 ; openssl dgst -sha256 test.tar.bz2 f4fad25a0e7a451ed906b76846efd6d2699a65b40795b29553addc35bf9a75c8 SHA256(test.tar.bz2)= f4fad25a0e7a451ed906b76846efd6d2699a65b40795b29553addc35bf9a75c8 $ wc -c test.tar* # these are not empty files 10240 test.tar 46 test.tar.bz2 10286 total $ tar tpvf test.tar # no entries $ tar tpvf test.tar.bz2 # no entries $ # ================================================================== $ # test.tar.bz2 works, but test.tar causes problems (tested in 2.7, $ # 3.5, and 3.6): $ python2.7 tarfail.py opening /…/tarfail/test.tar.bz2 opening /…/tarfail/test.tar E ====================================================================== ERROR: test_next (__main__.TestTarFileNext) ---------------------------------------------------------------------- Traceback (most recent call last): File "tarfail.py", line 29, in test_next next_info = tar_file.next() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 2350, in next self.fileobj.seek(self.offset - 1) IOError: [Errno 22] Invalid argument ---------------------------------------------------------------------- Ran 1 test in 0.005s FAILED (errors=1) $ python3.5 tarfail.py opening /…/tarfail/test.tar.bz2 opening /…/tarfail/test.tar E ====================================================================== ERROR: test_next (__main__.TestTarFileNext) ---------------------------------------------------------------------- Traceback (most recent call last): File "tarfail.py", line 29, in test_next next_info = tar_file.next() File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/tarfile.py", line 2273, in next self.fileobj.seek(self.offset - 1) OSError: [Errno 22] Invalid argument ---------------------------------------------------------------------- Ran 1 test in 0.066s FAILED (errors=1) $ python3.6 tarfail.py opening /…/tarfail/test.tar.bz2 opening /…/tarfail/test.tar E ====================================================================== ERROR: test_next (__main__.TestTarFileNext) ---------------------------------------------------------------------- Traceback (most recent call last): File "tarfail.py", line 29, in test_next next_info = tar_file.next() File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tarfile.py", line 2279, in next self.fileobj.seek(self.offset - 1) OSError: [Errno 22] Invalid argument ---------------------------------------------------------------------- Ran 1 test in 0.090s FAILED (errors=1) ``` Here's the issue (as far as I can tell): ``` $ ipdb tarfail.py > /…/tarfail/tarfail.py(3)<module>() 2 ----> 3 from __future__ import ( 4 absolute_import, division, print_function, unicode_literals, ipdb> b /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py:2350 Breakpoint 1 at /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py:2350 ipdb> c opening /…/tarfail/test.tar.bz2 > /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py(2350)next() 2349 if self.offset != self.fileobj.tell(): 1> 2350 self.fileobj.seek(self.offset - 1) 2351 if not self.fileobj.read(1): ipdb> self.fileobj <bz2.BZ2File object at 0x1067791d0> ipdb> self.offset, self.fileobj.tell(), self.offset - 1 (0, 512, -1) ipdb> c opening /…/tarfail/test.tar > /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py(2350)next() 2349 if self.offset != self.fileobj.tell(): 1> 2350 self.fileobj.seek(self.offset - 1) 2351 if not self.fileobj.read(1): ipdb> self.fileobj <open file u'/…/tarfail/test.tar', mode 'rb' at 0x10676dae0> ipdb> self.offset, self.fileobj.tell(), self.offset - 1 (0, 512, -1) ipdb> c E ====================================================================== ERROR: test_next (__main__.TestTarFileNext) ---------------------------------------------------------------------- Traceback (most recent call last): File "tarfail.py", line 29, in test_next next_info = tar_file.next() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 2350, in next self.fileobj.seek(self.offset - 1) IOError: [Errno 22] Invalid argument ---------------------------------------------------------------------- Ran 1 test in 38.300s FAILED (errors=1) The program exited via sys.exit(). Exit status: True > /…/tarfail/tarfail.py(3)<module>() 2 ----> 3 from __future__ import ( 4 absolute_import, division, print_function, unicode_literals, ipdb> EOF ``` Apparently, ``bz2.BZ2File`` allows seeking to pre-0 (negative) values, whereas more primitive files are not so forgiving. The offending line looks like it can be traced back to this commit: https://github.com/python/cpython/blame/2.7/Lib/tarfile.py#L2350 https://github.com/python/cpython/blame/3.3/Lib/tarfile.py#L2252 https://github.com/python/cpython/blame/3.4/Lib/tarfile.py#L2252 https://github.com/python/cpython/blame/3.5/Lib/tarfile.py#L2273 https://github.com/python/cpython/blame/3.6/Lib/tarfile.py#L2286 (My apologies for not catching this sooner.) |
|||
msg289265 - (view) | Author: Matt B (posita) * | Date: 2017-03-09 01:53 | |
FWIW, the (offending) fix for #24259 was introduced (e.g., in 2.7) via 2.7.10. I've verified that 2.7.9 works as expected: ``` $ python -V Python 2.7.9 $ python tarfail.py opening /…/tarfail/test.tar.bz2 opening /…/tarfail/test.tar . ---------------------------------------------------------------------- Ran 1 test in 0.010s OK ``` So this should probably be considered a regression. |
|||
msg289408 - (view) | Author: Matt B (posita) * | Date: 2017-03-10 20:21 | |
I'm not sure if it helps at this point, but I've tried several "flavors" of apparently legit tar files with zero entries. All fail. ``tarfile`` module: ``` $ ( set -x ; cd /tmp || exit 1 ; python -V ; rm -fv test.tar ; python -c 'import os, tarfile ; fd = os.open("test.tar", os.O_WRONLY | os.O_CREAT | os.O_EXCL) ; f = os.fdopen(fd, "w") ; f = tarfile.open("test.tar", "w", f) ; f.close() ; f = tarfile.open("test.tar") ; print("okay so far; calling f.next()...") ; f.next()' ; openssl dgst -sha256 test.tar ; rm -fv test.tar ) +/bin/zsh:496> cd /tmp +/bin/zsh:496> python -V Python 2.7.13 +/bin/zsh:496> rm -v -fv test.tar +/bin/zsh:496> python -c 'import os, tarfile ; fd = os.open("test.tar", os.O_WRONLY | os.O_CREAT | os.O_EXCL) ; f = os.fdopen(fd, "w") ; f = tarfile.open("test.tar", "w", f) ; f.close() ; f = tarfile.open("test.tar") ; print("okay so far; calling f.next()...") ; f.next()' okay so far; calling f.next()... Traceback (most recent call last): File "<string>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 2350, in next self.fileobj.seek(self.offset - 1) IOError: [Errno 22] Invalid argument +/bin/zsh:496> openssl dgst -sha256 test.tar SHA256(test.tar)= 84ff92691f909a05b224e1c56abb4864f01b4f8e3c854e4bb4c7baf1d3f6d652 +/bin/zsh:496> rm -v -fv test.tar test.tar ``` BSD tar (OS X): ``` $ ( set -x ; cd /tmp || exit 1 ; tar --version ; rm -fv test.tar ; tar -cf test.tar -T /dev/null ; python -c 'import tarfile ; f = tarfile.open("test.tar") ; print("okay so far; calling f.next()...") ; f.next()' ; openssl dgst -sha256 test.tar ; rm -fv test.tar ) +/bin/zsh:499> cd /tmp +/bin/zsh:499> tar --version bsdtar 2.8.3 - libarchive 2.8.3 +/bin/zsh:499> rm -v -fv test.tar +/bin/zsh:499> tar -cf test.tar -T /dev/null +/bin/zsh:499> python -c 'import tarfile ; f = tarfile.open("test.tar") ; print("okay so far; calling f.next()...") ; f.next()' okay so far; calling f.next()... Traceback (most recent call last): File "<string>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 2350, in next self.fileobj.seek(self.offset - 1) IOError: [Errno 22] Invalid argument +/bin/zsh:499> openssl dgst -sha256 test.tar SHA256(test.tar)= 5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef +/bin/zsh:499> rm -v -fv test.tar test.tar ``` GNU tar (OS X via MacPorts): ``` ( set -x ; cd /tmp || exit 1 ; gnutar --version ; rm -fv test.tar ; gnutar -cf test.tar -T /dev/null ; python -c 'import tarfile ; f = tarfile.open("test.tar") ; print("okay so far; calling f.next()...") ; f.next()' ; openssl dgst -sha256 test.tar ; rm -fv test.tar ) +-zsh:23> cd /tmp +-zsh:23> gnutar --version tar (GNU tar) 1.29 Copyright (C) 2015 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by John Gilmore and Jay Fenlason. +-zsh:23> rm -v -fv test.tar +-zsh:23> gnutar -cf test.tar -T /dev/null +-zsh:23> python -c 'import tarfile ; f = tarfile.open("test.tar") ; print("okay so far; calling f.next()...") ; f.next()' okay so far; calling f.next()... Traceback (most recent call last): File "<string>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 2350, in next self.fileobj.seek(self.offset - 1) IOError: [Errno 22] Invalid argument +-zsh:23> openssl dgst -sha256 test.tar SHA256(test.tar)= 84ff92691f909a05b224e1c56abb4864f01b4f8e3c854e4bb4c7baf1d3f6d652 +-zsh:23> rm -v -fv test.tar test.tar ``` The discussion from #24259 does not appear to contemplate this case, and seems to imply an assumption that there will be at least one entry (which is not always the case). |
|||
msg289409 - (view) | Author: Matt B (posita) * | Date: 2017-03-10 20:31 | |
This patch (also attached) seems to address this particular use case: ``` --- a/Lib/tarfile.py 2016-12-17 12:41:21.000000000 -0800 +++ b/Lib/tarfile.py 2017-03-10 12:23:34.000000000 -0800 @@ -2347,7 +2347,7 @@ # Advance the file pointer. if self.offset != self.fileobj.tell(): - self.fileobj.seek(self.offset - 1) + self.fileobj.seek(max(self.offset - 1, 0)) if not self.fileobj.read(1): raise ReadError("unexpected end of data") ``` However, I am unfamiliar with the code, especially in light of #24259, and haven't tested it thoroughly. Oversight is needed. |
|||
msg289417 - (view) | Author: Matt B (posita) * | Date: 2017-03-10 23:00 | |
After some consideration, I think this is probably more correct: ``` --- /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py 2016-12-17 12:41:21.000000000 -0800 +++ tarfile.py 2017-03-10 14:57:15.000000000 -0800 @@ -2347,9 +2347,10 @@ # Advance the file pointer. if self.offset != self.fileobj.tell(): - self.fileobj.seek(self.offset - 1) + self.fileobj.seek(max(self.offset - 1, 0)) if not self.fileobj.read(1): raise ReadError("unexpected end of data") + self.fileobj.seek(self.offset) # Read the next block. tarinfo = None ``` But again, I'm no expert here. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:44 | admin | set | github: 73946 |
2017-03-10 23:00:48 | posita | set | files:
+ tarfile.patch messages: + msg289417 |
2017-03-10 22:59:13 | posita | set | files: - tarfile.patch |
2017-03-10 20:31:03 | posita | set | files:
+ tarfile.patch keywords: + patch messages: + msg289409 |
2017-03-10 20:21:50 | posita | set | messages: + msg289408 |
2017-03-09 06:05:13 | serhiy.storchaka | set | nosy:
+ lars.gustaebel type: crash -> behavior versions: - Python 3.3, Python 3.4 |
2017-03-09 01:53:15 | posita | set | messages: + msg289265 |
2017-03-08 19:32:09 | posita | create |