Message320763
Buffer read of large files in a compressed tarfile stream performs poorly.
The buffered read in tarfile _Stream is extending a bytes object.
It is much more efficient to use a list followed by a join.
Using a list can mean seconds instead of minutes.
This performance regression was introduced in b506dc32c1a.
How to test:
# create random tarfile 50Mb
dd if=/dev/urandom of=test.bin count=50 bs=1M
tar czvf test.tgz test.bin
# read with tarfile as stream (note pipe symbol in 'r|gz')
import tarfile
tfile = tarfile.open("test.tgz", 'r|gz')
for t in tfile:
file = tfile.extractfile(t)
if file:
print(len(file.read())) |
|
Date |
User |
Action |
Args |
2018-06-30 09:27:00 | hajoscher | set | recipients:
+ hajoscher |
2018-06-30 09:27:00 | hajoscher | set | messageid: <1530350820.63.0.56676864532.issue34010@psf.upfronthosting.co.za> |
2018-06-30 09:27:00 | hajoscher | link | issue34010 messages |
2018-06-30 09:27:00 | hajoscher | create | |
|