This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lars.gustaebel
Recipients Thomas Güttler, ethan.furman, guettli, lars.gustaebel, martin.panter
Date 2015-05-28.07:42:53
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1432798974.11.0.995157294499.issue24259@psf.upfronthosting.co.za>
In-reply-to
Content
I have written a test for the issue, so that we have a basis for discussion.

There are four different scenarios where an unexpected eof can occur: inside a metadata block, directly after a metadata block, inside a data segment or directly after a data segment (i.e. missing end of archive marker).

Case #1 is taken care of (TruncatedHeaderError).

Case #4 is merely a violation of standard, which is neglectable.

Case #2 and #3 are essentially the same. If a data segment is empty or incomplete this means data was lost when the archive was created which should not go unnoticed when reading it. (see _FileInFile.read() for the code in question)

The problem is that, even after we have fixed case #2 and #4, we have no reliable way to detect an incomplete data segment unless we read it and count the bytes. If we simply iterate over the TarFile (e.g. do a TarFile.list()) the archive will appear intact. That is because in the TarFile.next() method we seek from one metadata block to the next, but we cannot simply detect if we seek beyond the end of the archive - except if we insist on the premise that each tar that we read is standards-compliant and comes with an end of archive marker (see case #4), which we probably should not.

Three possible options come to my mind:

1. Add a warning to the documentation that in order to test the integrity of an archive the user has to read through all the data segments.
2. Instead of using seek() in TarFile.next() use read() to advance the file pointer. This is a negative impact on the performance in most cases.
3. Insist on an end of archive marker. This has the disadvantage that users may get an exception although everything is fine.
History
Date User Action Args
2015-05-28 07:42:54lars.gustaebelsetrecipients: + lars.gustaebel, guettli, ethan.furman, martin.panter, Thomas Güttler
2015-05-28 07:42:54lars.gustaebelsetmessageid: <1432798974.11.0.995157294499.issue24259@psf.upfronthosting.co.za>
2015-05-28 07:42:54lars.gustaebellinkissue24259 messages
2015-05-28 07:42:53lars.gustaebelcreate