This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author maubp
Recipients maubp
Date 2017-04-07.09:29:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1491557387.23.0.595235956819.issue30012@psf.upfronthosting.co.za>
In-reply-to
Content
Under Python 2, gzip.open defaults to giving (non-unicode) strings.

Under Python 3, gzip.open defaults to giving bytes. Therefore it was fixed to allow text mode be specified, see http://bugs.python.org/issue13989

In order to write Python 2 and 3 compatible code to get strings from gzip, I now use:

>>> import gzip
>>> handle = gzip.open(filename, "rt")

In general mode="rt" works great, but I just found this fails under Windows XP running Python 2.7, example below using the following gzipped plain text file:

https://github.com/biopython/biopython/blob/master/Doc/examples/ls_orchid.gbk.gz

This works perfectly on Linux giving strings on both Python 2 and 3 - not I am printing with repr to confirm we have a string object:

$ python2.7 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 30-NOV-2006\n'
2.7.10 (default, Sep 28 2015, 13:58:31) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)]

Also with a slightly newer Python 2.7,

$ /mnt/apps/python/2.7/bin/python  -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 30-NOV-2006\n'
2.7.13 (default, Mar  9 2017, 15:07:48) 
[GCC 4.9.2 20150212 (Red Hat 4.9.2-6)]

$ python3.5 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 30-NOV-2006\n'
3.5.0 (default, Sep 28 2015, 11:25:31) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)]

$ python3.4 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 30-NOV-2006\n'
3.4.3 (default, Aug 21 2015, 11:12:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]

$ python3.3 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 30-NOV-2006\n'
3.3.0 (default, Nov  7 2012, 21:52:39) 
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)]


This works perfectly on macOS giving strings on both Python 2 and 3:


$ python2.7 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 30-NOV-2006\n'
2.7.10 (default, Jul 30 2016, 19:40:32) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)]

$ python3.6 -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline())); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 30-NOV-2006\n'
3.6.0 (v3.6.0:41df79263a11, Dec 22 2016, 17:23:13) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]


This works perfectly on Python 3 running on Windows XP,


C:\repositories\biopython\Doc\examples>c:\Python33\python.exe -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline()\
)); import sys; print(sys.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 30-NOV-2006\n'
3.3.5 (v3.3.5:62cf4e77f785, Mar  9 2014, 10:37:12) [MSC v.1600 32 bit (Intel)]

C:\repositories\biopython\Doc\examples> C:\Python34\python.exe -c "import gzip; print(repr(gzip.open('ls_orchid.gbk.gz', 'rt').readline(\
))); import sys; print(sy
s.version)"
'LOCUS       Z78533                   740 bp    DNA     linear   PLN 30-NOV-2006\n'
3.4.4 (v3.4.4:737efcadf5a6, Dec 20 2015, 19:28:18) [MSC v.1600 32 bit (Intel)]



However, it fails on Windows XP running Python 2.7.11 and (after upgrading) Python 2.7.13 though:


C:\repositories\biopython\Doc\examples>c:\Python27\python -c "import sys; print(sys.version); import gzip; print(repr(gzip.open('ls_orch\
id.gbk.gz', 'rt').readlines()))"
2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (Intel)]

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\Python27\lib\gzip.py", line 34, in open
    return GzipFile(filename, mode, compresslevel)
  File "c:\Python27\lib\gzip.py", line 94, in __init__
    fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
ValueError: Invalid mode ('rtb')


Note that the strangely contradictory mode seems to be accepted by Python 2.7 under Linux or macOS:


$ python
Python 2.7.10 (default, Sep 28 2015, 13:58:31) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> gzip.open('ls_orchid.gbk.gz', 'rt')
<gzip open file 'ls_orchid.gbk.gz', mode 'rtb' at 0x7f9af30c2f60 0x7f9aed1e5e50>
>>> quit()


$ python2.7
Python 2.7.10 (default, Jul 30 2016, 19:40:32) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> gzip.open('ls_orchid.gbk.gz', 'rt')
<gzip open file 'ls_orchid.gbk.gz', mode 'rtb' at 0x10282c6f0 0x10287ef10>
>>> quit()
History
Date User Action Args
2017-04-07 09:29:47maubpsetrecipients: + maubp
2017-04-07 09:29:47maubpsetmessageid: <1491557387.23.0.595235956819.issue30012@psf.upfronthosting.co.za>
2017-04-07 09:29:47maubplinkissue30012 messages
2017-04-07 09:29:46maubpcreate