classification
Title: tarfile touches directories twice
Type: Stage:
Components: Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: lars.gustaebel Nosy List: eric.araujo, lars.gustaebel, loewis, vstinner
Priority: normal Keywords: patch

Created on 2010-10-23 19:06 by loewis, last changed 2010-11-01 21:39 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
tarfile.diff loewis, 2010-10-23 19:06
tarfile-extract-directory-test.diff lars.gustaebel, 2010-10-24 12:32
tarfile2.diff loewis, 2010-10-24 19:10
Messages (5)
msg119467 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-23 19:06
When extracting, the time stamps of directories are modified twice: once when creating the directories, and then once in reverse order when done.
The first utimes call is redundant; it also causes issues with a buggy NFS server, see

http://www.python.org/dev/buildbot/3.1/builders/AMD64%20debian%20parallel%203.1/builds/147/steps/test/logs/stdio

(specifically, touching a file twice with the same second-resolution time stamp will increase the microsecond counter).

The attached patch works around the issue; regardless of the NFS bug, I think that the redundant call should be eliminated.
msg119510 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2010-10-24 12:32
The patch goes a bit too far. Though it solves this particular problem with the way TarFile.extractall() works, it breaks TarFile.extract(). Running the testsuite does not expose this, simply because there's no testcase :-(

Please see the attached testcase I just wrote which illustrates the problem.
msg119524 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-24 19:10
I see. Here is an updated version that makes the touch operations optional.
msg119545 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2010-10-25 11:05
I'm not entirely happy with the name of the "touch" argument. Apart from it being nice and short, I think it's a little too unix-y and might be misleading because it is not only about setting the modification time as the name implies, but also owner and mode. My proposal would be "restore_attrs" or "set_attrs" which isn't half as nice as "touch", but sums up better what's actually done. I leave this up to you.

I think the testcase wouldn't work on Windows the way it is now, would it?

Apart from these minor issues the patch gets my blessing, go ahead ;-)
msg120182 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-11-01 21:39
This is now committed as r86102. I opted to call the parameter set_attrs.
History
Date User Action Args
2010-11-01 21:39:48loewissetstatus: open -> closed
resolution: accepted
2010-11-01 21:39:39loewissetmessages: + msg120182
2010-11-01 21:01:35vstinnersetnosy: + vstinner
2010-10-29 19:34:16georg.brandllinkissue10230 superseder
2010-10-25 17:37:05eric.araujosetnosy: + eric.araujo

versions: + Python 3.1, Python 2.7, Python 3.2
2010-10-25 11:05:36lars.gustaebelsetmessages: + msg119545
2010-10-24 19:10:32loewissetfiles: + tarfile2.diff

nosy: - ghaering
messages: + msg119524

assignee: ghaering -> lars.gustaebel
2010-10-24 12:32:03lars.gustaebelsetfiles: + tarfile-extract-directory-test.diff
nosy: + lars.gustaebel
messages: + msg119510

2010-10-23 19:06:31loewiscreate