New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TarFile.getmember on directory requires trailing slash iff over 100 chars #66186
Comments
If a directory path is under 100 char you have to omit the trailing slash from the name passed to 'getmember'. If it is over 100 you have to include the trailing slash. As a work around I can use the private '_getmember' with 'normalize=True'. I tested on 2.7.2 and searched the release notes looking for a related fix since then. I couldn't find anything there, or here in the issue tracker. |
Could you please provide an example? |
Here is a script illustrating the issue. |
There is indeed special logic that triggers if the name is longer than 100 characters. Presumably it has a bug. Marking this as easy since it shouldn't be too hard, given the failure example, to figure out what is wrong and fix it (and turn the example into a unit test). It doesn't look like the relevant code has changed in python3, so the bug probably exists there as well. |
Apparently, the problem is located in TarInfo._proc_gnulong(). I attached a patch. When tarfile reads an archive, it strips trailing slashes from all filenames, except GNUTYPE_LONGNAME headers, which is a bug. tarfile creates GNU_FORMAT tar files by default, hence it uses an additional GNUTYPE_LONGNAME header for filenames >100 chars. That's why tarfile_issue.py fails if used with PAX_FORMAT, because PAX_FORMAT doesn't have this bug. |
Here is a 3.5 fix based on Lars Gustäbel's, with test. |
Added Matt Behrens test to Lars Gustäbel 2.7 version. |
This issue is 5 years old has 4 patches: it's far from being "newcomer friendly", I remove the "Easy" label. |
Any updates on this? |
So far, nobody proposed a pull request. So no, there is no update. Someone has to step in, dig into the issue, propose a fix, then someone else has to review the PR, and finally the PR should be merged. |
The original issue was twofold:
The second part is no longer an issue -- tested in 3.9 and 3.11 on MacOS. Currently the issue is that a trailing slash now doesn't work for lookup of dirs, no matter the size of name. This is inconsistent with the way shell commands work as well as various Python path related modules that tolerate trailing slash for dirs. This can cause users to wrongly assume a dir is absent in a tarfile, so I think it's worth fixing and I've added a PR with a test for both old and new issue. |
Well, the tar command strips trailing slashes (even from file paths), so it is reasonable to do this in getmember(). $ mkdir dir
$ touch dir/file
$ tar cf archive.tar dir
$ tar tf archive.tar dir
dir/
dir/file
$ tar tf archive.tar dir/
dir/
dir/file
$ tar tf archive.tar dir/file
dir/file
$ tar tf archive.tar dir/file/
dir/file
$ tar tf archive.tar dir/file////
dir/file |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: