New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tarfile missing entries due to omitted uid/gid fields #60062
Comments
The tarfile module silently truncates the list of entries when reading a tar file if it sees an entry with a uid/gid field containing only spaces/NULs. I got such a tarball from Java Maven/plexus-archiver. I don't know whether they write such fields deliberately, but it seems reasonable to me, especially since they were providing the user/group names textually. I'd like to see two fixes - a None/-1/0 value for the uid/gid and not silently swallowing HeaderErrors in TarFile.next() (or at least documenting why it's being done). 0 would be consistent with the default value when writing, but None seems more honest. -1 seems hard to defend. Only tested on silly Python versions (2.6, PyPy-1.8), sorry. It's what I've got to hand, but I think this issue also applies to recent Python too going by looking at the hg trunk. |
I think the default has to be 0 for consistency with how other empty numeric fields are handled. In theory spaces and NULs are supposed to be equivalent terminators in numeric fields, but I've just noticed that plexus-archiver is also using leading spaces rather than leading zeros (against the spec), e.g. ' 10422\x00 '. tarfile currently supports this, which I think is good, so I think the right approach is to lstrip(' ') fields and then treat space as a terminator. |
Could you provide some sample data and code? I see the problem, but I cannot quite reproduce the behaviour you describe. In all of my testcases tarfile either throws an exception or successfully reads the archive, but never silently stops. |
See attached bad.tar. $ less bad.tar | cat
drwxr-xr-x 0/0 0 2012-09-05 20:04 foo/
-rw-rw-r-- uname/gname 0 2012-09-05 20:04 foo/a
$ python -c 'import tarfile; print(tarfile.open("bad.tar").getnames())'
['foo']
$ python -c 'import tarfile, patch; patch.patch_tarfile(); print (tarfile.open("bad.tar").getnames())'
['foo', 'foo/a'] I'm only allowed to attach one file via the tracker web UI, so patch.py will follow. Creation code for bad.tar, largely for my benefit: import java.io.FileOutputStream;
import java.io.IOException;
import org.codehaus.plexus.archiver.tar.TarOutputStream;
import org.codehaus.plexus.archiver.tar.TarEntry; class TarTest {
} |
patch.py attached - what I'm using as a workaround at the moment. |
Here's the changes from patch.py, put into patch format. I took out the inlining for ntb() and support for different tarfile APIs, and also replaced the use of .split(,1)[0] by .partition(). |
The secondary issue, which the patch doesn't address, is that TarFile.next() seems unpythonic; it treats any {Invalid,Empty,Truncated}HeaderError after offset 0 as EOF rather than propagating the exception. It looks deliberate, but I'm not sure why it would be done like that or if it should be changed. |
I have attached a patch file for test_tarfile.py that uses the attached 'bad.tar' to test this issue. After applying this test patch, put 'bad.tar' in Lib/test with the name 'bpo-15858.tar'. This tests Tarfile.next(), but the issue can be seen via the command line interface, which uses .next() under the covers: Pre-patch: Post-patch: |
test fails for me with provided bad.tar [as described in comment] but test passed after applying patch to tarfile. |
I think bpo-24514 (fixed in Py2.7.11) is a duplicate of this issue. |
Yes I think you are right. With Python 3.5.0: $ python3 -c 'import tarfile; print(tarfile.open("bad.tar").getnames())'
['foo', 'foo/a']
$ python3 -m tarfile --list bad.tar
foo/
foo/a The proposed fix here is slightly different: truncate from the first space, rather than trailing whitespace. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: