Message406918
I have found that using the timeit module provides more precise measurements:
For a simple gzip header. (As returned by gzip.compress or zlib.compress with wbits=31)
./python -m timeit -s "import io; data = b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00'; from gzip import _read_gzip_header" '_read_gzip_header(io.BytesIO(data))'
For a gzip header with FNAME. (Returned by gzip itself and by Python's GzipFile)
./python -m timeit -s "import io; data = b'\x1f\x8b\x08\x08j\x1a\x9ea\x02\xffcompressable_file\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00'; from gzip import _read_gzip_header" '_read_gzip_header(io.BytesIO(data))'
For a gzip header with all flags set:
./python -m timeit -s 'import gzip, io; data = b"\x1f\x8b\x08\x1f\x00\x00\x00\x00\x00\xff\x05\x00extraname\x00comment\x00\xe9T"; from gzip import _read_gzip_header' '_read_gzip_header(io.BytesIO(data))'
Since performance is most critical for in-memory compression and decompression, I now optimized for no flags.
Before (current main): 500000 loops, best of 5: 469 nsec per loop
after (PR): 1000000 loops, best of 5: 390 nsec per loop
For the most common case of only FNAME set:
before: 200000 loops, best of 5: 1.48 usec per loop
after: 200000 loops, best of 5: 1.45 usec per loop
For the case where FCHRC is set:
before: 200000 loops, best of 5: 1.62 usec per loop
after: 100000 loops, best of 5: 2.43 usec per loop
So this PR is now a clear win for decompressing anything that has been compressed with gzip.compress. It is neutral for normal file decompression. There is a performance cost associated with correctly checking the header, but that is expected. It is better than the alternative of not checking it. |
|
Date |
User |
Action |
Args |
2021-11-24 11:10:50 | rhpvorderman | set | recipients:
+ rhpvorderman, serhiy.storchaka |
2021-11-24 11:10:50 | rhpvorderman | set | messageid: <1637752250.79.0.629261568834.issue45509@roundup.psfhosted.org> |
2021-11-24 11:10:50 | rhpvorderman | link | issue45509 messages |
2021-11-24 11:10:50 | rhpvorderman | create | |
|