I propose changing tarfile.DEFAULT_FORMAT to be tarfile.PAX_FORMAT , rather than the legacy tarfile.GNU_FORMAT for Python 3.8. This would offer several benefits:
• Removes limitations of the old GNU tar format, including in max UID/GID values and bits in device major and minor numbers, and is the most flexible and feature-rich tar format currently
• Encodes all filenames as UTF-8 in a portable way, ensuring consistent and correct handling on all platforms, avoid errors like [this one](https://stackoverflow.com/questions/19902544/tarfile-produce-garbled-file-name-in-the-tar-gz-archivement) and generally ensure expected, sensible defaults
• Is the current interoperable POSIX standard, used by all modern platforms (Linux, Unix, macOS, and third-party unarchivers on Windows) rather than a vendor-specific extension like GNU tar
• Backwards compatible with any unarchiver capable of reading ustar format, unlike GNU tar as the extended pax headers will just be ignored
• Fixes bpo-30661, support tarfile.PAX_FORMAT in shutil.make_archive (was proposed as a fix to the same, but it was never followed up on and the issue remains open)
This change would have no effect on reading existing archives, only writing new ones, and should be broadly compatible with any remotely modern system, as pax support is included in all the widely used libraries/systems:
* POSIX 2001 (major Unix vendors), released in 2001 (18 years ago)
* GNU tar 1.14 (Linux, etc), released in 2004 (15 years ago)
* bsdtar/libtar ~1.2.51 (BSD, macOS, etc), at least as of 2006 (13 years ago), with significant bug fixes up through 2011 (8 years ago)
* 7-zip (Windows) at some point before 2011 (>8 years ago), with significant bug fixes up to 2011 (8 years ago)
* Python 2.6, released in 2008 (11 years ago)
Furthermore, essentially every existing archiver supports ustar format, which would allow interoperability on very old/exotic platforms that don't support pax for some reason (and would certainly not support GNU). Therefore, it should be more than safe to make the change now, with archivers on the three major platforms supporting the modern standard for nearly a decade, and any esoteric ones at least as likely to support the POSIX standard as the vendor-specific GNU extension.
Is there any particular reason why we shouldn't make this change? Is there a particular group/list I should contact to follow up about seeing this implemented? It seems it should only require a one-line change [here](https://stackoverflow.com/questions/19902544/tarfile-produce-garbled-file-name-in-the-tar-gz-archivement), aside from updating the docs and presumably the NEWS file, which I would be willing to do (I would think it should make a fairly straightforward first contribution).
|