This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: TarFile.getmember on directory requires trailing slash iff over 100 chars
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: af, andrei.avk, lars.gustaebel, miss-islington, moloney, puppet, r.david.murray, serhiy.storchaka, vstinner, zigg
Priority: normal Keywords: patch

Created on 2014-07-16 03:40 by moloney, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tarfile_issue.py moloney, 2014-07-16 19:57
issue21987.diff lars.gustaebel, 2014-07-23 09:57
issue21987_py3.5_with_test.patch zigg, 2014-07-26 00:06 review
issue21987_py2.7_with_test.patch puppet, 2014-08-02 08:54 review
Pull Requests
URL Status Linked Edit
PR 30283 merged andrei.avk, 2021-12-28 16:58
PR 30737 merged miss-islington, 2022-01-21 07:40
PR 30738 merged miss-islington, 2022-01-21 07:40
Messages (15)
msg223167 - (view) Author: Brendan Moloney (moloney) Date: 2014-07-16 03:40
If a directory path is under 100 char you have to omit the trailing slash from the name passed to 'getmember'. If it is over 100 you have to include the trailing slash.

As a work around I can use the private '_getmember' with 'normalize=True'.

I tested on 2.7.2 and searched the release notes looking for a related fix since then. I couldn't find anything there, or here in the issue tracker.
msg223174 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-07-16 05:50
Could you please provide an example?
msg223264 - (view) Author: Brendan Moloney (moloney) Date: 2014-07-16 19:57
Here is a script illustrating the issue.
msg223432 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-18 20:50
There is indeed special logic that triggers if the name is longer than 100 characters.  Presumably it has a bug.  Marking this as easy since it shouldn't be too hard, given the failure example, to figure out what is wrong and fix it (and turn the example into a unit test).

It doesn't look like the relevant code has changed in python3, so the bug probably exists there as well.
msg223732 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2014-07-23 09:57
Apparently, the problem is located in TarInfo._proc_gnulong(). I attached a patch.

When tarfile reads an archive, it strips trailing slashes from all filenames, except GNUTYPE_LONGNAME headers, which is a bug. tarfile creates GNU_FORMAT tar files by default, hence it uses an additional GNUTYPE_LONGNAME header for filenames >100 chars. That's why tarfile_issue.py fails if used with PAX_FORMAT, because PAX_FORMAT doesn't have this bug.
msg224014 - (view) Author: Matt Behrens (zigg) * Date: 2014-07-26 00:06
Here is a 3.5 fix based on Lars Gustäbel's, with test.
msg224542 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-02 08:54
Added Matt Behrens test to Lars Gustäbel 2.7 version.
msg348618 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-29 11:32
This issue is 5 years old has 4 patches: it's far from being "newcomer friendly", I remove the "Easy" label.
msg376370 - (view) Author: af (af) Date: 2020-09-04 15:01
Any updates on this?
msg376373 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-04 15:18
> Any updates on this?

So far, nobody proposed a pull request. So no, there is no update.

Someone has to step in, dig into the issue, propose a fix, then someone else has to review the PR, and finally the PR should be merged.
msg409261 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-12-28 17:08
The original issue was twofold:
1. below 100 char not working with trailing slash
2. over 100 char not working WITHOUT trailing slash

The second part is no longer an issue -- tested in 3.9 and 3.11 on MacOS.

Currently the issue is that a trailing slash now doesn't work for lookup of dirs, no matter the size of name.

This is inconsistent with the way shell commands work as well as various Python path related modules that tolerate trailing slash for dirs.

This can cause users to wrongly assume a dir is absent in a tarfile, so I think it's worth fixing and I've added a PR with a test for both old and new issue.
msg409265 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-12-28 18:51
Well, the tar command strips trailing slashes (even from file paths), so it is reasonable to do this in getmember().

$ mkdir dir
$ touch dir/file
$ tar cf archive.tar dir
$ tar tf archive.tar dir
dir/
dir/file
$ tar tf archive.tar dir/
dir/
dir/file
$ tar tf archive.tar dir/file
dir/file
$ tar tf archive.tar dir/file/
dir/file
$ tar tf archive.tar dir/file////
dir/file
msg411089 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-01-21 07:40
New changeset cfadcc31ea84617b1c73022ce54d4ae831333e8d by andrei kulakov in branch 'main':
bpo-21987: Fix TarFile.getmember getting a dir with a trailing slash (GH-30283)
https://github.com/python/cpython/commit/cfadcc31ea84617b1c73022ce54d4ae831333e8d
msg411092 - (view) Author: miss-islington (miss-islington) Date: 2022-01-21 08:06
New changeset 1d11fdd3eeff77ba600278433b7ab0ce4d2a7f3b by Miss Islington (bot) in branch '3.10':
bpo-21987: Fix TarFile.getmember getting a dir with a trailing slash (GH-30283)
https://github.com/python/cpython/commit/1d11fdd3eeff77ba600278433b7ab0ce4d2a7f3b
msg411391 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-01-23 17:54
New changeset 94d6434ba7ec3e4b154e515c5583b0b665ab0b09 by Miss Islington (bot) in branch '3.9':
[3.9] bpo-21987: Fix TarFile.getmember getting a dir with a trailing slash (GH-30283) (GH-30738)
https://github.com/python/cpython/commit/94d6434ba7ec3e4b154e515c5583b0b665ab0b09
History
Date User Action Args
2022-04-11 14:58:06adminsetgithub: 66186
2022-01-23 17:54:59serhiy.storchakasetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.7, Python 3.5
2022-01-23 17:54:22serhiy.storchakasetmessages: + msg411391
2022-01-21 08:06:05miss-islingtonsetmessages: + msg411092
2022-01-21 07:40:46miss-islingtonsetpull_requests: + pull_request28925
2022-01-21 07:40:44serhiy.storchakasetmessages: + msg411089
2022-01-21 07:40:41miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request28924
2021-12-28 18:51:01serhiy.storchakasetmessages: + msg409265
2021-12-28 17:08:06andrei.avksetmessages: + msg409261
2021-12-28 16:58:04andrei.avksetnosy: + andrei.avk
pull_requests: + pull_request28497
2020-09-04 15:18:45vstinnersetmessages: + msg376373
2020-09-04 15:01:43afsetnosy: + af
messages: + msg376370
2019-07-29 11:32:48vstinnersetkeywords: - easy
nosy: + vstinner
messages: + msg348618

2014-08-04 14:07:02ezio.melottisetstage: test needed -> patch review
2014-08-02 08:54:22puppetsetfiles: + issue21987_py2.7_with_test.patch
nosy: + puppet
messages: + msg224542

2014-07-26 00:06:33ziggsetfiles: + issue21987_py3.5_with_test.patch
versions: + Python 3.5
nosy: + zigg

messages: + msg224014
2014-07-23 09:57:06lars.gustaebelsetfiles: + issue21987.diff
keywords: + patch
messages: + msg223732
2014-07-18 20:50:56r.david.murraysetkeywords: + easy
nosy: + r.david.murray
messages: + msg223432

2014-07-16 19:57:48moloneysetfiles: + tarfile_issue.py

messages: + msg223264
2014-07-16 05:50:13serhiy.storchakasetnosy: + lars.gustaebel, serhiy.storchaka

messages: + msg223174
stage: test needed
2014-07-16 03:40:59moloneycreate