Message186320
Regression in Python 3.3.0 to 3.3.1, tested under Mac OS X 10.8 and CentOS Linux 64bit.
The same regression also present in going from Python 2.7.3 from 2.7.4, does that need a separate issue filed?
Consider this VALID GZIP file, human link:
https://github.com/biopython/biopython/blob/master/Tests/GenBank/cor6_6.gb.bgz
Binary link, only a small file:
https://raw.github.com/biopython/biopython/master/Tests/GenBank/cor6_6.gb.bgz
This is compressed using a GZIP variant called BGZF which uses multiple blocks and records additional tags in the header, for background see:
http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html
$ curl -O https://raw.github.com/biopython/biopython/master/Tests/GenBank/cor6_6.gb.bgz
$ cat cor6_6.gb.bgz | gunzip | wc
320 1183 14967
Now for the bug, expected behaviour:
$ python3.2
Python 3.2 (r32:88445, Feb 28 2011, 17:04:33)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> handle = gzip.open("cor6_6.gb.bgz", "rb")
>>> data = handle.read()
>>> handle.close()
>>> len(data)
14967
>>> quit()
Broken behaviour:
$ python3.3
Python 3.3.1 (default, Apr 8 2013, 17:54:08)
[GCC 4.2.1 Compatible Apple Clang 4.0 ((tags/Apple/clang-421.0.57))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> handle = gzip.open("cor6_6.gb.bgz", "rb")
>>> data = handle.read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/pjcock/lib/python3.3/gzip.py", line 359, in read
while self._read(readsize):
File "/Users/pjcock/lib/python3.3/gzip.py", line 432, in _read
if not self._read_gzip_header():
File "/Users/pjcock/lib/python3.3/gzip.py", line 305, in _read_gzip_header
self._read_exact(struct.unpack("<H", self._read_exact(2)))
File "/Users/pjcock/lib/python3.3/gzip.py", line 282, in _read_exact
data = self.fileobj.read(n)
File "/Users/pjcock/lib/python3.3/gzip.py", line 81, in read
return self.file.read(size)
TypeError: integer argument expected, got 'tuple'
The bug is very simple, an error in line 205 of gzip.py:
203 if flag & FEXTRA:
204 # Read & discard the extra field, if present
205 self._read_exact(struct.unpack("<H", self._read_exact(2)))
The struct.unpack method returns a single element tuple, thus a fix is:
203 if flag & FEXTRA:
204 # Read & discard the extra field, if present
205 extra_len, = struct.unpack("<H", self._read_exact(2))
206 self._read_exact(extra_len)
This bug was identified via failing Biopython unit tests under Python 2.7.4 and 3.3.1, which all pass with this minor fix applied. |
|
Date |
User |
Action |
Args |
2013-04-08 17:55:12 | maubp | set | recipients:
+ maubp |
2013-04-08 17:55:12 | maubp | set | messageid: <1365443712.06.0.547975068624.issue17666@psf.upfronthosting.co.za> |
2013-04-08 17:55:12 | maubp | link | issue17666 messages |
2013-04-08 17:55:11 | maubp | create | |
|