Message390588
The original issue is reported here.
https://discuss.python.org/t/non-optimal-bz2-reading-speed/6869
1. Only BZ2File uses RLock()
lzma and gzip don't use RLock(). It adds significant performance overhead.
When I removed `with self._lock:`, decompression speed improved from about 148k line/sec to 200k line/sec.
2. The default __iter__ calls `readline()` for each iteration.
BZ2File.readline() is implemented in C so it is slightly slow than C implementation.
If I add this `__iter__()` to BZ2File, decompression speed improved from about 148k lines/sec (or 200k lines/sec) to 500k lines/sec.
def __iter__(self):
self._check_can_read()
return iter(self._buffer)
If this __iter__ method is safe, it can be added to gzip and lzma too. |
|
Date |
User |
Action |
Args |
2021-04-09 05:42:16 | methane | set | recipients:
+ methane |
2021-04-09 05:42:16 | methane | set | messageid: <1617946936.07.0.338866506245.issue43785@roundup.psfhosted.org> |
2021-04-09 05:42:16 | methane | link | issue43785 messages |
2021-04-09 05:42:15 | methane | create | |
|