New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tarfile fails to extract archive (handled fine by gnu tar and bsdtar) #68702
Comments
The extraction fails when calling tarfile.open using this archive: http://archive.apache.org/dist/commons/logging/source/commons-logging-1.1.2-src.tar.gz After some investigation, the file can be extracted with gnu tar and bsdtar and the gzip compression is not the issue: if I gunzip the tar.gz to a tar and call tarfile on plain tar, the problem is the same. Also this archive was created most likely on Windows (based on the The error trace is slightly different on 2.7 and 3.4 but similar. On 2.7: >>> TarFile.taropen(name)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/tarfile.py", line 1705, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/usr/lib/python2.7/tarfile.py", line 1574, in __init__
self.firstmember = self.next()
File "/usr/lib/python2.7/tarfile.py", line 2335, in next
raise ReadError(str(e))
tarfile.ReadError: invalid header On 3.4: >>> TarFile.taropen(name)
Traceback (most recent call last):
File "/usr/lib/python3.4/tarfile.py", line 180, in nti
n = int(nts(s, "ascii", "strict") or "0", 8)
ValueError: invalid literal for int() with base 8: ' '
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.4/tarfile.py", line 2248, in next
tarinfo = self.tarinfo.fromtarfile(self)
File "/usr/lib/python3.4/tarfile.py", line 1083, in fromtarfile
obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
File "/usr/lib/python3.4/tarfile.py", line 1032, in frombuf
obj.uid = nti(buf[108:116])
File "/usr/lib/python3.4/tarfile.py", line 182, in nti
raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/tarfile.py", line 1595, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/usr/lib/python3.4/tarfile.py", line 1469, in __init__
self.firstmember = self.next()
File "/usr/lib/python3.4/tarfile.py", line 2260, in next
raise ReadError(str(e))
tarfile.ReadError: invalid header |
Note: the traceback above are from calling taropen on the gunzipped tar.gz |
The problem is that the tar archive has empty uid and gid fields, i.e. 7 spaces terminated with a null-byte. I attached a patch that solves the problem. |
lars: you are my hero! you rock. I picture you being able to read through tar binary headers while you sleep. I am in awe. |
You're welcome :-D |
I verified that the patch bpo-24514.diff (adding .rstrip() ) works also on Python 2.7. I verified it also works on Python 3.4 I ran it on 2.7 against a fairly large test suite of tar files without problems. This is a +1 for me. Lars: Do you think you could apply it to 2.7 too? |
Yes, Python 2.7 still gets bugfixes. However, there's still some work to do on the patch (maybe clean the code, write a test, add a NEWS entry). |
The patch is very simple, but this needs tests. At the very least, a simple tar file which reproduces this issue could be added to the tests. Taking this a step further would be writing some unit tests for the internal nti() and itn() functions, and perhaps also stn() and nts(). |
I think a simple addition to the existing unittest for nti() will be enough. itn() seems well-tested, and nts() and stn() are not affected, because they don't operate on numbers. |
New changeset 301d7efac3de by Lars Gustäbel in branch '2.7': New changeset 140b4b7b84bd by Lars Gustäbel in branch '3.4': New changeset 1692065524cc by Lars Gustäbel in branch '3.5': New changeset 08fad9037206 by Lars Gustäbel in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: