classification
Title: TarFile.getmember on directory requires trailing slash iff over 100 chars
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: lars.gustaebel, moloney, puppet, r.david.murray, serhiy.storchaka, zigg
Priority: normal Keywords: easy, patch

Created on 2014-07-16 03:40 by moloney, last changed 2014-08-04 14:07 by ezio.melotti.

Files
File name Uploaded Description Edit
tarfile_issue.py moloney, 2014-07-16 19:57
issue21987.diff lars.gustaebel, 2014-07-23 09:57
issue21987_py3.5_with_test.patch zigg, 2014-07-26 00:06 review
issue21987_py2.7_with_test.patch puppet, 2014-08-02 08:54 review
Messages (7)
msg223167 - (view) Author: Brendan Moloney (moloney) Date: 2014-07-16 03:40
If a directory path is under 100 char you have to omit the trailing slash from the name passed to 'getmember'. If it is over 100 you have to include the trailing slash.

As a work around I can use the private '_getmember' with 'normalize=True'.

I tested on 2.7.2 and searched the release notes looking for a related fix since then. I couldn't find anything there, or here in the issue tracker.
msg223174 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-07-16 05:50
Could you please provide an example?
msg223264 - (view) Author: Brendan Moloney (moloney) Date: 2014-07-16 19:57
Here is a script illustrating the issue.
msg223432 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-18 20:50
There is indeed special logic that triggers if the name is longer than 100 characters.  Presumably it has a bug.  Marking this as easy since it shouldn't be too hard, given the failure example, to figure out what is wrong and fix it (and turn the example into a unit test).

It doesn't look like the relevant code has changed in python3, so the bug probably exists there as well.
msg223732 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2014-07-23 09:57
Apparently, the problem is located in TarInfo._proc_gnulong(). I attached a patch.

When tarfile reads an archive, it strips trailing slashes from all filenames, except GNUTYPE_LONGNAME headers, which is a bug. tarfile creates GNU_FORMAT tar files by default, hence it uses an additional GNUTYPE_LONGNAME header for filenames >100 chars. That's why tarfile_issue.py fails if used with PAX_FORMAT, because PAX_FORMAT doesn't have this bug.
msg224014 - (view) Author: Matt Behrens (zigg) * Date: 2014-07-26 00:06
Here is a 3.5 fix based on Lars Gustäbel's, with test.
msg224542 - (view) Author: Daniel Eriksson (puppet) * Date: 2014-08-02 08:54
Added Matt Behrens test to Lars Gustäbel 2.7 version.
History
Date User Action Args
2014-08-04 14:07:02ezio.melottisetstage: test needed -> patch review
2014-08-02 08:54:22puppetsetfiles: + issue21987_py2.7_with_test.patch
nosy: + puppet
messages: + msg224542

2014-07-26 00:06:33ziggsetfiles: + issue21987_py3.5_with_test.patch
versions: + Python 3.5
nosy: + zigg

messages: + msg224014
2014-07-23 09:57:06lars.gustaebelsetfiles: + issue21987.diff
keywords: + patch
messages: + msg223732
2014-07-18 20:50:56r.david.murraysetkeywords: + easy
nosy: + r.david.murray
messages: + msg223432

2014-07-16 19:57:48moloneysetfiles: + tarfile_issue.py

messages: + msg223264
2014-07-16 05:50:13serhiy.storchakasetnosy: + lars.gustaebel, serhiy.storchaka

messages: + msg223174
stage: test needed
2014-07-16 03:40:59moloneycreate