classification
Title: tarfile normalizes arcname
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lars.gustaebel Nosy List: lars.gustaebel, mkv, srid
Priority: normal Keywords:

Created on 2009-05-18 16:19 by mkv, last changed 2010-05-17 17:54 by srid. This issue is now closed.

Messages (7)
msg88033 - (view) Author: (mkv) Date: 2009-05-18 16:19
When creating tar archives using the tarfile module, requested arc names
are not respected. 

It is currently impossible to create a tar which when listing contents
would give:
$tar tvf test.tar
./
./control
./prerm
./postinst

The actual result will be
$tar tvf test.tar
./
control
prerm
postinst

This is caused by TarInfo's tobuf method calling normpath() on all file
names, even when the user has explicitly specified a certain name.
msg88150 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2009-05-21 09:18
So, what exactly are trying to accomplish? Why do you need that?
msg88157 - (view) Author: (mkv) Date: 2009-05-21 14:44
I'm creating a debian package (.deb) for a system which uses busybox's
dpkg. A deb is an ar-archive (not tar, unix ar) archive, which in turn
contains two tar archives. dpkg will first extract a tar archive called
control.tar.gz (or bz2) from the package, and from that tar it will
extract a file stored with the path "./control". 

The problem is that with the current implementation of tarfile it's
impossible to create a tar archive which would contain a file stored
with the path "./control". This means it's impossible to use tarfile to
create deb packages which would work with busybox' dpkg. 

I'm not 100% sure if that precise path is requirement of the deb file
format, or if it is because of how busybox' dpkg is implemented. However
I have not seen a packaging guide or a deb package which wouldn't have
the control file stored as ./control in the tar archive.
msg88230 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2009-05-23 12:03
Apparently, the .deb file format is not explicit about that, but it
seems to be common practice to have all files prefixed with './'.

normpath is used all over tarfile, crucial are the occurrences in
TarFile.add() and TarInfo.get_info(). As you're using a unix-like system
the easiest workaround is to replace the module level tarfile.normpath
function with a no-op.

The original assumption for using normpath on all pathnames was to keep
the names in an archive clean and in their canonical form. Most
occurrences of normpath date back to the 2003 original version (cp.
r30613) and have never been touched.

But, I found nothing in POSIX about normalizing pathnames. GNU tar and
star both strip different leading path components like "./" and "../"
from pathnames, but they both don't remove "./" components from inside a
pathname, for example. This means that the usage of normpath seems more
or less unnecessary in tarfile.

I will create a patch that addresses these issues.

Thanks for your report.
msg88233 - (view) Author: (mkv) Date: 2009-05-23 12:39
Great, thanks for the speedy work :) 

Now if only issue4750 would get fixed for 2.7 as well ;)
msg92044 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2009-08-28 20:56
I have done some research in order to find a suitable behaviour for
tarfile. I wrote a script to test to what extent all the different tar
implementations transform input pathnames. The results can be found at
http://www.gustaebel.de/lars/tarfile/wwgtd.html.

My conclusion is the following: tarfile now does no pathname
transformation whatsoever except for converting absolute to relative
paths (to stay backwards compatible). This way tarfile is closer to
POSIX, applies less magic and gives more responsibility to the user.

Fixed in r74571 (trunk) and r74573 (py3k). Thanks for your report.
msg105922 - (view) Author: Sridhar Ratnakumar (srid) Date: 2010-05-17 17:54
Apparently this fix introduced a regression. See issue8741
History
Date User Action Args
2010-05-17 17:54:16sridsetnosy: + srid
messages: + msg105922
2009-08-28 20:56:24lars.gustaebelsetstatus: open -> closed
resolution: fixed
messages: + msg92044

versions: + Python 3.2, - Python 3.1
2009-05-23 12:39:14mkvsetmessages: + msg88233
2009-05-23 12:03:19lars.gustaebelsetmessages: + msg88230
versions: + Python 3.1, Python 2.7, - Python 2.6
2009-05-21 14:44:25mkvsetmessages: + msg88157
2009-05-21 09:18:37lars.gustaebelsetmessages: + msg88150
2009-05-18 20:31:43loewissetassignee: lars.gustaebel

nosy: + lars.gustaebel
2009-05-18 16:19:55mkvcreate