New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TarFile expose copyfileobj bufsize to improve throughput #71386
Comments
The default of 16k while good for memory usage it is not well suited for all cases. if we increased this to 4MB we saw a pretty large improvement to tar file creation and extraction on linux servers. For a 1gb tar file containing 1024 random files each of 10MB in size. |
New feature -> 3.6. |
New changeset 0bac85e355b5 by Łukasz Langa in branch 'default': |
Thanks for the patch! |
cpython uses copyfileobj under the hood for fast copies but the default buffer size is quite low which increases the amount of time in python code when copying the sqlite database. As this is the usually the bulk of the backup, increasing the buffer can help reduce the backup time quite a bit related: python/cpython#71386
* Make bufsize adjustable cpython uses copyfileobj under the hood for fast copies but the default buffer size is quite low which increases the amount of time in python code when copying the sqlite database. As this is the usually the bulk of the backup, increasing the buffer can help reduce the backup time quite a bit related: python/cpython#71386 * coverage
This is the same change as home-assistant/core#90613 but for supervisor If the backup takes too long, core will release the lock on the database and the backup will be no good https://github.com/home-assistant/core/blob/2fc34e7cced87a8e042919e059d3a07bb760c77f/homeassistant/components/recorder/core.py#L926 cpython uses copyfileobj under the hood for fast copies but the default buffer size is quite low which increases the amount of time in python code when copying the sqlite database. As this is the usually the bulk of the backup, increasing the buffer can help reduce the backup time quite a bit. Ideally this would all use sendfile under the hood as it would shift nearly all the burden out of userspace but tarfile doesn't currently try that https://github.com/python/cpython/blob/4664a7cf689946f0c9854cadee7c6aa9c276a8cf/Lib/shutil.py#L106 related: In testing (non encrypted) improvement was at least as good as python/cpython#71386
* Speed up backups by increasing buffer size This is the same change as home-assistant/core#90613 but for supervisor If the backup takes too long, core will release the lock on the database and the backup will be no good https://github.com/home-assistant/core/blob/2fc34e7cced87a8e042919e059d3a07bb760c77f/homeassistant/components/recorder/core.py#L926 cpython uses copyfileobj under the hood for fast copies but the default buffer size is quite low which increases the amount of time in python code when copying the sqlite database. As this is the usually the bulk of the backup, increasing the buffer can help reduce the backup time quite a bit. Ideally this would all use sendfile under the hood as it would shift nearly all the burden out of userspace but tarfile doesn't currently try that https://github.com/python/cpython/blob/4664a7cf689946f0c9854cadee7c6aa9c276a8cf/Lib/shutil.py#L106 related: In testing (non encrypted) improvement was at least as good as python/cpython#71386 * add the const
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: