This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tarfile chokes on reading .tar file with no entries (but does fine if the same file is bzip2'ed)
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: lars.gustaebel, posita
Priority: normal Keywords: patch

Created on 2017-03-08 19:32 by posita, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
tarfail.tar.bz2 posita, 2017-03-08 19:32 test case with data files
tarfile.patch posita, 2017-03-10 23:00 possible fix
Messages (5)
msg289253 - (view) Author: Matt B (posita) * Date: 2017-03-08 19:32
It looks like there's a problem examining ``.tar`` files with no entries:

```
$ # ==================================================================
$ # Extract test cases (attached to this bug report)
$ tar xpvf tarfail.tar.bz2
x tarfail/
x tarfail/tarfail.py
x tarfail/test.tar
x tarfail/test.tar.bz2
$ cd tarfail
$ # ==================================================================
$ # Note that test.tar.bz2 is just test.tar, but bzip2'ed:
$ bzip2 -c test.tar | openssl dgst -sha256 ; openssl dgst -sha256 test.tar.bz2
f4fad25a0e7a451ed906b76846efd6d2699a65b40795b29553addc35bf9a75c8
SHA256(test.tar.bz2)= f4fad25a0e7a451ed906b76846efd6d2699a65b40795b29553addc35bf9a75c8
$ wc -c test.tar*  # these are not empty files
   10240 test.tar
      46 test.tar.bz2
   10286 total
$ tar tpvf test.tar  # no entries
$ tar tpvf test.tar.bz2  # no entries
$ # ==================================================================
$ # test.tar.bz2 works, but test.tar causes problems (tested in 2.7,
$ # 3.5, and 3.6):
$ python2.7 tarfail.py
opening /…/tarfail/test.tar.bz2
opening /…/tarfail/test.tar
E
======================================================================
ERROR: test_next (__main__.TestTarFileNext)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tarfail.py", line 29, in test_next
    next_info = tar_file.next()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 2350, in next
    self.fileobj.seek(self.offset - 1)
IOError: [Errno 22] Invalid argument

----------------------------------------------------------------------
Ran 1 test in 0.005s

FAILED (errors=1)
$ python3.5 tarfail.py
opening /…/tarfail/test.tar.bz2
opening /…/tarfail/test.tar
E
======================================================================
ERROR: test_next (__main__.TestTarFileNext)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tarfail.py", line 29, in test_next
    next_info = tar_file.next()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/tarfile.py", line 2273, in next
    self.fileobj.seek(self.offset - 1)
OSError: [Errno 22] Invalid argument

----------------------------------------------------------------------
Ran 1 test in 0.066s

FAILED (errors=1)
$ python3.6 tarfail.py
opening /…/tarfail/test.tar.bz2
opening /…/tarfail/test.tar
E
======================================================================
ERROR: test_next (__main__.TestTarFileNext)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tarfail.py", line 29, in test_next
    next_info = tar_file.next()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tarfile.py", line 2279, in next
    self.fileobj.seek(self.offset - 1)
OSError: [Errno 22] Invalid argument

----------------------------------------------------------------------
Ran 1 test in 0.090s

FAILED (errors=1)
```

Here's the issue (as far as I can tell):

```
$ ipdb tarfail.py
> /…/tarfail/tarfail.py(3)<module>()
      2
----> 3 from __future__ import (
      4     absolute_import, division, print_function, unicode_literals,

ipdb> b /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py:2350
Breakpoint 1 at /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py:2350
ipdb> c
opening /…/tarfail/test.tar.bz2
> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py(2350)next()
   2349         if self.offset != self.fileobj.tell():
1> 2350             self.fileobj.seek(self.offset - 1)
   2351             if not self.fileobj.read(1):

ipdb> self.fileobj
<bz2.BZ2File object at 0x1067791d0>
ipdb> self.offset, self.fileobj.tell(), self.offset - 1
(0, 512, -1)
ipdb> c
opening /…/tarfail/test.tar
> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py(2350)next()
   2349         if self.offset != self.fileobj.tell():
1> 2350             self.fileobj.seek(self.offset - 1)
   2351             if not self.fileobj.read(1):

ipdb> self.fileobj
<open file u'/…/tarfail/test.tar', mode 'rb' at 0x10676dae0>
ipdb> self.offset, self.fileobj.tell(), self.offset - 1
(0, 512, -1)
ipdb> c
E
======================================================================
ERROR: test_next (__main__.TestTarFileNext)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tarfail.py", line 29, in test_next
    next_info = tar_file.next()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 2350, in next
    self.fileobj.seek(self.offset - 1)
IOError: [Errno 22] Invalid argument

----------------------------------------------------------------------
Ran 1 test in 38.300s

FAILED (errors=1)
The program exited via sys.exit(). Exit status: True
> /…/tarfail/tarfail.py(3)<module>()
      2
----> 3 from __future__ import (
      4     absolute_import, division, print_function, unicode_literals,

ipdb> EOF
```

Apparently, ``bz2.BZ2File`` allows seeking to pre-0 (negative) values, whereas more primitive files are not so forgiving. The offending line looks like it can be traced back to this commit:

https://github.com/python/cpython/blame/2.7/Lib/tarfile.py#L2350
https://github.com/python/cpython/blame/3.3/Lib/tarfile.py#L2252
https://github.com/python/cpython/blame/3.4/Lib/tarfile.py#L2252
https://github.com/python/cpython/blame/3.5/Lib/tarfile.py#L2273
https://github.com/python/cpython/blame/3.6/Lib/tarfile.py#L2286

(My apologies for not catching this sooner.)
msg289265 - (view) Author: Matt B (posita) * Date: 2017-03-09 01:53
FWIW, the (offending) fix for #24259 was introduced (e.g., in 2.7) via 2.7.10. I've verified that 2.7.9 works as expected:

```
$ python -V
Python 2.7.9
$ python tarfail.py
opening /…/tarfail/test.tar.bz2
opening /…/tarfail/test.tar
.
----------------------------------------------------------------------
Ran 1 test in 0.010s

OK
```

So this should probably be considered a regression.
msg289408 - (view) Author: Matt B (posita) * Date: 2017-03-10 20:21
I'm not sure if it helps at this point, but I've tried several "flavors" of apparently legit tar files with zero entries. All fail.

``tarfile`` module:

```
$ ( set -x ; cd /tmp || exit 1 ; python -V ; rm -fv test.tar ; python -c 'import os, tarfile ; fd = os.open("test.tar", os.O_WRONLY | os.O_CREAT | os.O_EXCL) ; f = os.fdopen(fd, "w") ; f = tarfile.open("test.tar", "w", f) ; f.close() ; f = tarfile.open("test.tar") ; print("okay so far; calling f.next()...") ; f.next()' ; openssl dgst -sha256 test.tar ; rm -fv test.tar )
+/bin/zsh:496> cd /tmp
+/bin/zsh:496> python -V
Python 2.7.13
+/bin/zsh:496> rm -v -fv test.tar
+/bin/zsh:496> python -c 'import os, tarfile ; fd = os.open("test.tar", os.O_WRONLY | os.O_CREAT | os.O_EXCL) ; f = os.fdopen(fd, "w") ; f = tarfile.open("test.tar", "w", f) ; f.close() ; f = tarfile.open("test.tar") ; print("okay so far; calling f.next()...") ; f.next()'
okay so far; calling f.next()...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 2350, in next
    self.fileobj.seek(self.offset - 1)
IOError: [Errno 22] Invalid argument
+/bin/zsh:496> openssl dgst -sha256 test.tar
SHA256(test.tar)= 84ff92691f909a05b224e1c56abb4864f01b4f8e3c854e4bb4c7baf1d3f6d652
+/bin/zsh:496> rm -v -fv test.tar
test.tar
```

BSD tar (OS X):

```
$ ( set -x ; cd /tmp || exit 1 ; tar --version ; rm -fv test.tar ; tar -cf test.tar -T /dev/null ; python -c 'import tarfile ; f = tarfile.open("test.tar") ; print("okay so far; calling f.next()...") ; f.next()' ; openssl dgst -sha256 test.tar ; rm -fv test.tar )
+/bin/zsh:499> cd /tmp
+/bin/zsh:499> tar --version
bsdtar 2.8.3 - libarchive 2.8.3
+/bin/zsh:499> rm -v -fv test.tar
+/bin/zsh:499> tar -cf test.tar -T /dev/null
+/bin/zsh:499> python -c 'import tarfile ; f = tarfile.open("test.tar") ; print("okay so far; calling f.next()...") ; f.next()'
okay so far; calling f.next()...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 2350, in next
    self.fileobj.seek(self.offset - 1)
IOError: [Errno 22] Invalid argument
+/bin/zsh:499> openssl dgst -sha256 test.tar
SHA256(test.tar)= 5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef
+/bin/zsh:499> rm -v -fv test.tar
test.tar
```

GNU tar (OS X via MacPorts):

```
( set -x ; cd /tmp || exit 1 ; gnutar --version ; rm -fv test.tar ; gnutar -cf test.tar -T /dev/null ; python -c 'import tarfile ; f = tarfile.open("test.tar") ; print("okay so far; calling f.next()...") ; f.next()' ; openssl dgst -sha256 test.tar ; rm -fv test.tar )
+-zsh:23> cd /tmp
+-zsh:23> gnutar --version
tar (GNU tar) 1.29
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.
+-zsh:23> rm -v -fv test.tar
+-zsh:23> gnutar -cf test.tar -T /dev/null
+-zsh:23> python -c 'import tarfile ; f = tarfile.open("test.tar") ; print("okay so far; calling f.next()...") ; f.next()'
okay so far; calling f.next()...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 2350, in next
    self.fileobj.seek(self.offset - 1)
IOError: [Errno 22] Invalid argument
+-zsh:23> openssl dgst -sha256 test.tar
SHA256(test.tar)= 84ff92691f909a05b224e1c56abb4864f01b4f8e3c854e4bb4c7baf1d3f6d652
+-zsh:23> rm -v -fv test.tar
test.tar
```

The discussion from #24259 does not appear to contemplate this case, and seems to imply an assumption that there will be at least one entry (which is not always the case).
msg289409 - (view) Author: Matt B (posita) * Date: 2017-03-10 20:31
This patch (also attached) seems to address this particular use case:

```
--- a/Lib/tarfile.py	2016-12-17 12:41:21.000000000 -0800
+++ b/Lib/tarfile.py	2017-03-10 12:23:34.000000000 -0800
@@ -2347,7 +2347,7 @@
 
         # Advance the file pointer.
         if self.offset != self.fileobj.tell():
-            self.fileobj.seek(self.offset - 1)
+            self.fileobj.seek(max(self.offset - 1, 0))
             if not self.fileobj.read(1):
                 raise ReadError("unexpected end of data")
 
```

However, I am unfamiliar with the code, especially in light of #24259, and haven't tested it thoroughly. Oversight is needed.
msg289417 - (view) Author: Matt B (posita) * Date: 2017-03-10 23:00
After some consideration, I think this is probably more correct:

```
--- /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py  2016-12-17 12:41:21.000000000 -0800
+++ tarfile.py  2017-03-10 14:57:15.000000000 -0800
@@ -2347,9 +2347,10 @@

         # Advance the file pointer.
         if self.offset != self.fileobj.tell():
-            self.fileobj.seek(self.offset - 1)
+            self.fileobj.seek(max(self.offset - 1, 0))
             if not self.fileobj.read(1):
                 raise ReadError("unexpected end of data")
+            self.fileobj.seek(self.offset)

         # Read the next block.
         tarinfo = None
```

But again, I'm no expert here.
History
Date User Action Args
2022-04-11 14:58:44adminsetgithub: 73946
2017-03-10 23:00:48positasetfiles: + tarfile.patch

messages: + msg289417
2017-03-10 22:59:13positasetfiles: - tarfile.patch
2017-03-10 20:31:03positasetfiles: + tarfile.patch
keywords: + patch
messages: + msg289409
2017-03-10 20:21:50positasetmessages: + msg289408
2017-03-09 06:05:13serhiy.storchakasetnosy: + lars.gustaebel

type: crash -> behavior
versions: - Python 3.3, Python 3.4
2017-03-09 01:53:15positasetmessages: + msg289265
2017-03-08 19:32:09positacreate