This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Tarfile superfluous truncate calls slows extraction.
Type: performance Stage: patch review
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lukasz.langa Nosy List: asvetlov, fried, lars.gustaebel, lukasz.langa, python-dev
Priority: normal Keywords: patch

Created on 2016-06-03 04:56 by fried, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
truncate.patch fried, 2016-06-03 04:56 patch to move truncate for only sparse tar entries. review
test.py fried, 2016-06-03 04:58 test file to generate random tar for benchmark
Messages (4)
msg267035 - (view) Author: Jason Fried (fried) * Date: 2016-06-03 04:56
With large tar file extracts I noticed that tarfile was slower than it should be.  Seems in linux that for large files (10MB) truncate is not always a free operation even when it should be a no-op. ex: File is already 10mb seek to end and truncate. 

I created a script to test the validity of this patch.  It generates two random tar archives containing 1024 files of 10mb each. The files are randomized so disk caching should not interfere. 

So to extract those 1g tar files the following was observed
Time Delta for TarFile: 148.23699307441711
Time Delta for FastTarFile: 107.71058106422424
Time Diff: 40.52641201019287 0.27338932859929255
msg267037 - (view) Author: Jason Fried (fried) * Date: 2016-06-03 04:58
I ran this on Linux ext4.  I didn't see much improvement on OSX with SSD.
msg268297 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-06-11 23:59
New changeset b63474aa8a5f by Łukasz Langa in branch '3.5':
Issue #27194: superfluous truncate calls in tarfile.py slow down extraction
https://hg.python.org/cpython/rev/b63474aa8a5f

New changeset a4f918de25e5 by Łukasz Langa in branch 'default':
Merge 3.5, issue #27194
https://hg.python.org/cpython/rev/a4f918de25e5
msg268298 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2016-06-12 00:02
Thanks for the patch, Jason. This is now merged and will be available in 3.5.2 and 3.6.
History
Date User Action Args
2022-04-11 14:58:31adminsetgithub: 71381
2016-06-12 00:02:19lukasz.langasetstatus: open -> closed
resolution: fixed
messages: + msg268298
2016-06-11 23:59:33python-devsetnosy: + python-dev
messages: + msg268297
2016-06-11 17:28:55lukasz.langasetassignee: lukasz.langa
stage: patch review
2016-06-03 05:07:30serhiy.storchakasetnosy: + lars.gustaebel
2016-06-03 04:58:33friedsetfiles: + test.py

messages: + msg267037
2016-06-03 04:56:33friedcreate