This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: gzip fails to read a gzipped file (ValueError: readline of closed file)
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.11, Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: methane, minstrelofc, miss-islington, xtreak
Priority: normal Keywords: 3.10regression, patch

Created on 2021-10-14 21:12 by minstrelofc, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
UTF-8-test_for_gzip.txt.gz minstrelofc, 2021-10-14 21:12 gzip test file, just in case
Pull Requests
URL Status Linked Edit
PR 29016 merged methane, 2021-10-18 01:49
PR 29050 merged miss-islington, 2021-10-19 02:52
Messages (5)
msg403948 - (view) Author: (minstrelofc) Date: 2021-10-14 21:12
Attempting to iterate over an opened gzip file raises a ValueError: readline of closed file

Behavior in Python 3.9.7:
Python 3.9.7 (default, Oct 13 2021, 09:08:19) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> ll = [l for l in gzip.GzipFile(filename='data/UTF-8-test_for_gzip.txt.gz')]
>>> len(ll)
300


Behavior in Python 3.10.0 (and 3.11.0a1 is the same):
Python 3.10.0 (default, Oct 13 2021, 08:53:15) [GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> ll = [l for l in gzip.GzipFile(filename='data/UTF-8-test_for_gzip.txt.gz')]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
ValueError: readline of closed file


This only happens when iterating directly over the GzipFile object. Using a with: statement has the correct behaviour in both 3.10 and 3.11:
>>> with gzip.GzipFile(filename='UTF-8-test_for_gzip.txt.gz') as input_file:
...     len(list(input_file))
... 
300
msg403981 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2021-10-15 07:34
This might be related to below commit : 

commit d2a8e69c2c605fbaa3656a5f99aa8d295f74c80e
Author: Inada Naoki <songofacandy@gmail.com>
Date:   Tue Apr 13 13:51:49 2021 +0900

    bpo-43787: Add __iter__ to GzipFile, BZ2File, and LZMAFile (GH-25353)


python -m gzip README.rst
(myenv) ➜  cpython git:(main) ✗ git checkout d2a8e69c2c605fbaa3656a5f99aa8d295f74c80e~1 Lib/gzip.py
Updated 1 path from 2ea7c00ab4
(myenv) ➜  cpython git:(main) ✗ ./python
Python 3.11.0a1+ (heads/main:160c38df7f, Oct 15 2021, 11:25:16) [GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> len([None for _ in gzip.GzipFile("README.rst.gz")])
267
>>> 
(myenv) ➜  cpython git:(main) ✗ git checkout d2a8e69c2c605fbaa3656a5f99aa8d295f74c80e Lib/gzip.py 
Updated 1 path from 1f9874eec6
(myenv) ➜  cpython git:(main) ✗ ./python
Python 3.11.0a1+ (heads/main:160c38df7f, Oct 15 2021, 11:25:16) [GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> len([None for _ in gzip.GzipFile("README.rst.gz")])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
ValueError: readline of closed file
msg404061 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-10-16 02:26
>>> ll = [l for l in gzip.GzipFile(filename='data/UTF-8-test_for_gzip.txt.gz')]

This is bad code pattern because you don't close the file explicitly.
Actually, the error caused by the optimization thet iter(GzipFile) returns underlaying faster iterator that don't have reference to the GzipFile. So GzipFile.__del__ close the file.

Although this is caused by bad code pattern, I must admit this is a regression.
We need to call slow Python function for each lines instead of using fast C iterator...
msg404263 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-10-19 02:52
New changeset 0a4c82ddd34a3578684b45b76f49cd289a08740b by Inada Naoki in branch 'main':
bpo-45475: Revert `__iter__` optimization for GzipFile, BZ2File, and LZMAFile. (GH-29016)
https://github.com/python/cpython/commit/0a4c82ddd34a3578684b45b76f49cd289a08740b
msg404265 - (view) Author: miss-islington (miss-islington) Date: 2021-10-19 03:15
New changeset 97ce855ca8ce437070424b43f5b41158685ac140 by Miss Islington (bot) in branch '3.10':
bpo-45475: Revert `__iter__` optimization for GzipFile, BZ2File, and LZMAFile. (GH-29016)
https://github.com/python/cpython/commit/97ce855ca8ce437070424b43f5b41158685ac140
History
Date User Action Args
2022-04-11 14:59:51adminsetgithub: 89638
2021-10-19 03:32:27methanesetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-10-19 03:15:56miss-islingtonsetmessages: + msg404265
2021-10-19 02:52:07miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request27321
2021-10-19 02:52:00methanesetmessages: + msg404263
2021-10-18 01:49:40methanesetkeywords: + patch
stage: patch review
pull_requests: + pull_request27290
2021-10-16 02:26:15methanesetmessages: + msg404061
2021-10-15 07:34:49xtreaksetnosy: + xtreak, methane
messages: + msg403981

components: + Library (Lib)
keywords: + 3.10regression
2021-10-14 21:12:01minstrelofccreate