Author maubp
Recipients maubp
Date 2013-04-08.17:55:11
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1365443712.06.0.547975068624.issue17666@psf.upfronthosting.co.za>
In-reply-to
Content
Regression in Python 3.3.0 to 3.3.1, tested under Mac OS X 10.8 and CentOS Linux 64bit.

The same regression also present in going from Python 2.7.3 from 2.7.4, does that need a separate issue filed?

Consider this VALID GZIP file, human link:
https://github.com/biopython/biopython/blob/master/Tests/GenBank/cor6_6.gb.bgz

Binary link, only a small file:
https://raw.github.com/biopython/biopython/master/Tests/GenBank/cor6_6.gb.bgz

This is compressed using a GZIP variant called BGZF which uses multiple blocks and records additional tags in the header, for background see:
http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html

$ curl -O https://raw.github.com/biopython/biopython/master/Tests/GenBank/cor6_6.gb.bgz
$ cat cor6_6.gb.bgz | gunzip | wc
     320    1183   14967

Now for the bug, expected behaviour:

$ python3.2
Python 3.2 (r32:88445, Feb 28 2011, 17:04:33) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> handle = gzip.open("cor6_6.gb.bgz", "rb")
>>> data = handle.read()
>>> handle.close()
>>> len(data)
14967
>>> quit()

Broken behaviour:

$ python3.3
Python 3.3.1 (default, Apr  8 2013, 17:54:08) 
[GCC 4.2.1 Compatible Apple Clang 4.0 ((tags/Apple/clang-421.0.57))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> handle = gzip.open("cor6_6.gb.bgz", "rb")
>>> data = handle.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pjcock/lib/python3.3/gzip.py", line 359, in read
    while self._read(readsize):
  File "/Users/pjcock/lib/python3.3/gzip.py", line 432, in _read
    if not self._read_gzip_header():
  File "/Users/pjcock/lib/python3.3/gzip.py", line 305, in _read_gzip_header
    self._read_exact(struct.unpack("<H", self._read_exact(2)))
  File "/Users/pjcock/lib/python3.3/gzip.py", line 282, in _read_exact
    data = self.fileobj.read(n)
  File "/Users/pjcock/lib/python3.3/gzip.py", line 81, in read
    return self.file.read(size)
TypeError: integer argument expected, got 'tuple'


The bug is very simple, an error in line 205 of gzip.py:

203     if flag & FEXTRA:
204         # Read & discard the extra field, if present
205         self._read_exact(struct.unpack("<H", self._read_exact(2)))

The struct.unpack method returns a single element tuple, thus a fix is:

203     if flag & FEXTRA:
204         # Read & discard the extra field, if present
205         extra_len, = struct.unpack("<H", self._read_exact(2))
206         self._read_exact(extra_len)

This bug was identified via failing Biopython unit tests under Python 2.7.4 and 3.3.1, which all pass with this minor fix applied.
History
Date User Action Args
2013-04-08 17:55:12maubpsetrecipients: + maubp
2013-04-08 17:55:12maubpsetmessageid: <1365443712.06.0.547975068624.issue17666@psf.upfronthosting.co.za>
2013-04-08 17:55:12maubplinkissue17666 messages
2013-04-08 17:55:11maubpcreate