classification
Title: gzip.open() needs an optional encoding argument
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, daniel.urban, eric.araujo, nadeem.vawda, pitrou, rasmusory, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2011-07-14 15:31 by rhettinger, last changed 2012-06-26 21:02 by nadeem.vawda. This issue is now closed.

Files
File name Uploaded Description Edit
issue12559.patch daniel.urban, 2011-07-14 19:23 Patch 1 review
Messages (7)
msg140341 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-07-14 15:31
gzip.open() should parallel file.open() so that that zipped files can be read in the same way as regular files:

for line in gzip.open('notes.txt', 'r', encoding='latin-1'):
    print(line.rstrip())
msg140369 - (view) Author: Daniel Urban (daniel.urban) * (Python triager) Date: 2011-07-14 19:23
Here is a patch. If the code changes are acceptable I can also make a documentation patch.

(I'm surprised to see 3.2 in "Versions". I thought 3.2 only gets bugfixes...)
msg140373 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-07-14 20:44
There remains a difference between open() and gzip.open():
open(filename, 'r', encoding=None) is a text file (with a default encoding), gzip.open() with the same arguments returns a binary file.

Don't know how to fix this though.
msg140407 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-07-15 13:57
If we go this way, the "errors" and "newline" argument should be added as well.
msg140462 - (view) Author: Daniel Urban (daniel.urban) * (Python triager) Date: 2011-07-15 19:16
> If we go this way, the "errors" and "newline" argument should be added
> as well.

Yeah, I thought about that. I can make a new patch, that implement this, if needed. Though it seems there is a real problem, the one that Amaury Forgeot d'Arc mentioned. I can't think of a way to solve it in a backwards compatible way.
msg163928 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-06-25 10:30
Why not use io.TextWrapper? I think it is the right answer for this issue.
msg164106 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2012-06-26 21:02
I already fixed this without knowing about this issue; see 55202ca694d7.


storchaka:
> Why not use io.TextWrapper? I think it is the right answer for this issue.

The proposed patch (and the code I committed) *do* use TextIOWrapper.

Unless you mean that callers should create the TextIOWrapper themselves.
This is certainly possible, but quite inconvenient for something that is
conceptually simple, and not difficult to implement.


amaury.forgeotdarc:
> There remains a difference between open() and gzip.open():
> open(filename, 'r', encoding=None) is a text file (with a default encoding), gzip.open() with the same arguments returns a binary file.

The committed code unfortunately still has gzip.open(filename, "r")
returning a binary file. This is something that cannot be fixed without
breaking backward compatibility.

However, it does provide a way to open a text file with the system's
default encoding (encoding=None, or no encoding argument specified).
To do this, you can use the "rt"/"wt"/"at" modes, just like with
builtins.open(). Of course, this also works if you do specify an encoding
explicitly.
History
Date User Action Args
2012-06-26 21:02:21nadeem.vawdasetstatus: open -> closed
versions: + Python 3.3, - Python 3.4
messages: + msg164106

resolution: fixed
stage: patch review -> resolved
2012-06-25 10:30:41serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg163928
2012-06-25 05:35:52eric.araujosetnosy: + nadeem.vawda

versions: + Python 3.4, - Python 3.3
2011-11-25 13:26:46rasmusorysetnosy: + rasmusory
2011-07-18 16:04:52eric.araujosetnosy: + eric.araujo
2011-07-15 19:16:29daniel.urbansetmessages: + msg140462
2011-07-15 13:57:15pitrousetnosy: + pitrou

messages: + msg140407
stage: patch review
2011-07-15 00:37:02rhettingersetversions: - Python 3.2
2011-07-14 20:44:14amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg140373
2011-07-14 19:23:10daniel.urbansetfiles: + issue12559.patch

nosy: + daniel.urban
messages: + msg140369

keywords: + patch
2011-07-14 15:31:22rhettingercreate