This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tarfile.TarFile may write corrupt files if not closed
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: lars.gustaebel, lordmauve
Priority: normal Keywords:

Created on 2018-09-13 17:20 by lordmauve, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg325266 - (view) Author: Daniel Pope (lordmauve) * Date: 2018-09-13 17:20
A tarfile.TarFile object open for writing may silently write corrupt tar files if it is destroyed before being closed.

While explicitly calling close() or using the object as a context manager is recommended, I would not expect this in basic usage.

There are two steps needed for a TarFile to be closed properly:

* According to https://github.com/python/cpython/blob/3.7/Lib/tarfile.py#L1726, two zero blocks must be written (though GNU tar seems to work even if these are absent)
* The underlying fileobj (an io.BufferedWriter) must then be flushed

A BufferedWriter is flushed in its __del__(); the problem is that TarFile objects form a reference cycle with their TarInfo members due to this line, which has the comment "Not Needed": https://github.com/python/cpython/blob/3.7/Lib/tarfile.py#L1801

Under PEP-442, when the TarFile becomes unreferenced the following Cycle Isolate is formed:

    TarInfo <=> TarFile -> BufferedWriter -> FileIO

Finalisers for these objects are run in an undefined order. If the FileIO finaliser is run before the BufferedWriter finaliser, then the fd is closed, buffered data in the BufferedWriter is not committed to disk, and the tar file is corrupt.

Additionally, while ResourceWarning is issued if the BufferedWriter or FileIO are left unclosed, no such warning is emitted by the TarFile.
msg325269 - (view) Author: Daniel Pope (lordmauve) * Date: 2018-09-13 17:39
I have several suggestions for steps to address this:

1. Don't create reference cycles. TarInfo.tarfile does not appear to be a documented attribute (https://docs.python.org/3/library/tarfile.html#tarinfo-objects) and could perhaps be deleted.
2. Issue a ResourceWarning in TarFile.__del__() if the TarFile was not closed prior to finalisation. ResourceWarnings are ignored by default but this would help when debugging. Given that the file may be corrupted perhaps something more visible than a ResourceWarning is required.
3. Make TarFile.__del__() close the TarFile cleanly. This is only possible if we can guarantee the underlying fileobj is finalized later (eg. because we have eliminated the reference cycle).
History
Date User Action Args
2022-04-11 14:59:05adminsetgithub: 78843
2018-09-13 21:47:05ned.deilysetnosy: + lars.gustaebel
2018-09-13 17:39:44lordmauvesetmessages: + msg325269
2018-09-13 17:20:55lordmauvecreate