This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tarfile: ignore_zeros = True won't raise exception even on invalid (non-zero) TARs
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: iritkatriel, mxmlnkn
Priority: normal Keywords:

Created on 2020-05-24 18:10 by mxmlnkn, last changed 2022-04-11 14:59 by admin.

Messages (4)
msg369816 - (view) Author: mxmlnkn (mxmlnkn) Date: 2020-05-24 18:10
Normally, when opening an existing non-TAR file, e.g., a file with random data, an exception is raised:

tarfile.open( "foo.txt" )

    ---------------------------------------------------------------------------
    ReadError                                 Traceback (most recent call last)
    <ipython-input-53-aa60172c3e3b> in <module>()
    ----> 1 f = tarfile.open( "notes.txt", ignore_zeros = False )

    /usr/lib/python3.7/tarfile.py in open(cls, name, mode, fileobj, bufsize, **kwargs)
       1576                         fileobj.seek(saved_pos)
       1577                     continue
    -> 1578             raise ReadError("file could not be opened successfully")
       1579 
       1580         elif ":" in mode:

    ReadError: file could not be opened successfully

However, when specifying ignore_zeros = True, this check against invalid data seems to be turned off. Note that it is >invalid< data not >zero< data and therefore should still raise an exception!

tarfile.open( "foo.txt", ignore_zeros = True )

Iterating over that opened tarfile also works without exception however nothing will be iterated over, i.e., it behaves like an empty TAR instead of like an invalid TAR.
msg411124 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-01-21 13:02
I am unable to reproduce this on 3.11:

>>> import tarfile

>>> tarfile.open( "foo.txt" )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/iritkatriel/src/cpython-1/Lib/tarfile.py", line 1613, in open
    return func(name, "r", fileobj, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/iritkatriel/src/cpython-1/Lib/tarfile.py", line 1679, in gzopen
    fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/iritkatriel/src/cpython-1/Lib/gzip.py", line 174, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'foo.txt'



>>> tarfile.open( "foo.txt", ignore_zeros = True )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/iritkatriel/src/cpython-1/Lib/tarfile.py", line 1613, in open
    return func(name, "r", fileobj, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/iritkatriel/src/cpython-1/Lib/tarfile.py", line 1679, in gzopen
    fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/iritkatriel/src/cpython-1/Lib/gzip.py", line 174, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'foo.txt'
>>>
msg411178 - (view) Author: mxmlnkn (mxmlnkn) Date: 2022-01-21 19:57
I think you misunderstood. foo.txt is a file, which actually exists but contains non-TAR data. E.g. try:

base64 /dev/urandom | head -c $(( 2048 ))  > foo.txt
python3 -c 'import tarfile; print(list(tarfile.open("foo.txt")))'

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.9/tarfile.py", line 1616, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

python3 -c 'import tarfile; print(list(tarfile.open("foo.txt", ignore_zeros=True)))'

[]
msg411385 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-01-23 17:21
Thank you for clarifying. I can reproduce this on 3.11.
History
Date User Action Args
2022-04-11 14:59:31adminsetgithub: 84934
2022-01-23 17:21:36iritkatrielsetresolution: out of date ->
messages: + msg411385
versions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.7
2022-01-21 19:57:26mxmlnknsetstatus: pending -> open

messages: + msg411178
2022-01-21 13:02:04iritkatrielsetstatus: open -> pending

nosy: + iritkatriel
messages: + msg411124

resolution: out of date
2020-05-24 18:10:44mxmlnkncreate