This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Joe Tsai
Recipients Joe Tsai
Date 2017-09-22.21:42:01
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1506116521.84.0.445900877205.issue31557@psf.upfronthosting.co.za>
In-reply-to
Content
The original V7 header only allocates 100B to store the file path. If a path exceeds this length, then either the PAX format or GNU formats must be used, which can represent arbitrarily long file paths. When doing so, most tar writers just store the first 100B of the file path in the V7 header.

When reading, a proper reader should disregard the contents of the V7 field if a previous and corresponding PAX or GNU header overrode it.

This currently not the case with the tarfile module, which has the following check (https://github.com/python/cpython/blob/c7cc14a825ec156c76329f65bed0d0bd6e03d035/Lib/tarfile.py#L1054-L1057):
    # Old V7 tar format represents a directory as a regular
    # file with a trailing slash.
    if obj.type == AREGTYPE and obj.name.endswith("/"):
        obj.type = DIRTYPE

This check should be further constrained to only activate when there were no prior PAX or GNU records that override that value of obj.name. This check was the source of a bug that caused tarfile to report a regular as a directory because the file path was extra long, and when the tar write truncated the path to the first 100B, it so happened to end on a slash.
History
Date User Action Args
2017-09-22 21:42:02Joe Tsaisetrecipients: + Joe Tsai
2017-09-22 21:42:01Joe Tsaisetmessageid: <1506116521.84.0.445900877205.issue31557@psf.upfronthosting.co.za>
2017-09-22 21:42:01Joe Tsailinkissue31557 messages
2017-09-22 21:42:01Joe Tsaicreate