classification
Title: Optimize BZ2File, GzipFile, and LZMAFile __iter__ method.
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: corona10, gregory.p.smith, malin, methane
Priority: normal Keywords: patch

Created on 2021-04-09 08:49 by methane, last changed 2021-04-13 04:52 by methane. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 25353 merged methane, 2021-04-12 06:16
Messages (3)
msg390599 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-04-09 08:49
__iter__ method of BZ2File, GzipFile, and LZMAFile is IOBase.__iter__. It calls `readline()` for each line.

Since `readline()` is defined as Python function, it is slower than C iterator. Adding custom __iter__ method that delegates to underlying buffer __iter__ makes `for line in file` 2x faster.

    def __iter__(self):
        self._check_can_read()
        return self._buffer.__iter__()

---

The original issue is reported here.
https://discuss.python.org/t/non-optimal-bz2-reading-speed/6869
This issue is relating to #43785.
msg390836 - (view) Author: Ma Lin (malin) * Date: 2021-04-12 11:03
I think this change is safe.

The behaviors should be exactly the same, except the iterators are different objects (obj vs obj._buffer).
msg390921 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-04-13 04:51
New changeset d2a8e69c2c605fbaa3656a5f99aa8d295f74c80e by Inada Naoki in branch 'master':
bpo-43787: Add __iter__ to GzipFile, BZ2File, and LZMAFile (GH-25353)
https://github.com/python/cpython/commit/d2a8e69c2c605fbaa3656a5f99aa8d295f74c80e
History
Date User Action Args
2021-04-13 04:52:26methanesetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-04-13 04:51:56methanesetmessages: + msg390921
2021-04-12 11:03:14malinsetmessages: + msg390836
2021-04-12 07:15:36malinsetnosy: + malin
2021-04-12 06:16:53methanesetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request24088
2021-04-11 21:36:32gregory.p.smithsetnosy: + gregory.p.smith
2021-04-11 21:36:27gregory.p.smithsetstage: needs patch
2021-04-09 09:30:39corona10setnosy: + corona10
2021-04-09 08:49:43methanecreate