This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: wrong FNAME in tarfile if tgz extension is used
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: maciej.mm.misiak, veaviticus
Priority: normal Keywords:

Created on 2021-06-23 06:55 by maciej.mm.misiak, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg396382 - (view) Author: Maciej Misiak (maciej.mm.misiak) Date: 2021-06-23 06:55
This code is incomplete:

def _init_write_gz(self):
...
    if self.name.endswith(".gz"):
        self.name = self.name[:-3]
    # RFC1952 says we must use ISO-8859-1 for the FNAME field.
    self.__write(self.name.encode("iso-8859-1", "replace") + NUL)

If it is used in following way '.gz' is stripped properly and FNAME='somefile.tar':
    tarfile.open('somefile.tar.gz', 'w:gz')
but with 
    tarfile.open('somefile.tgz', 'w:gz')

FNAME is incorrectly prepared as somefile.tgz
msg405468 - (view) Author: Rob Nelson (veaviticus) * Date: 2021-11-01 20:36
The code referenced in the previous comment only hits for tarfiles built from Streams. 

The same (incorrect) code exists in the gzip.py library as well, and hits the more common usecase of building a tar.gz from a set of files on disk.

def _write_gzip_header(self, compresslevel):
    self.fileobj.write(b'\037\213')             # magic header
    self.fileobj.write(b'\010')                 # compression method
    try:
        # RFC 1952 requires the FNAME field to be Latin-1. Do not
        # include filenames that cannot be represented that way.
        fname = os.path.basename(self.name)
        if not isinstance(fname, bytes):
            fname = fname.encode('latin-1')
        if fname.endswith(b'.gz'):
            fname = fname[:-3]

This effects decompressing the file with 7zip, who respects the FNAME value, and thus attempts to create a new file with the same name as the file its currently decompressing. Or if you extract to another directory, it creates a tar file that is named "foo.tgz", which is confusing to users who are expecting a tar.

You can very easily reproduce this:

import tarfile
f = tarfile.open("test.tgz", mode="w:gz")
f.close()

and then "extract" the file with 7zip
History
Date User Action Args
2022-04-11 14:59:47adminsetgithub: 88661
2022-01-16 17:58:03iritkatrielsettype: behavior
versions: - Python 3.6, Python 3.7, Python 3.8
2021-11-01 20:36:05veaviticussetnosy: + veaviticus
messages: + msg405468
2021-06-23 06:55:14maciej.mm.misiakcreate