classification
Title: tarfile.TarFile.getmembers misses some entries
Type: behavior Stage: resolved
Components: Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lars.gustaebel Nosy List: bins, lars.gustaebel, python-dev
Priority: normal Keywords:

Created on 2011-10-12 12:54 by bins, last changed 2011-10-14 11:05 by bins. This issue is now closed.

Messages (5)
msg145393 - (view) Author: Sebastien Binet (bins) Date: 2011-10-12 12:54
hi there,

it seems tarfile in python 3.2.2 (as installed in archlinux, but I don't see any additional patch applied on top of the vanilla sources:
http://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/python
) has troubles giving the complete content of a tar ball.

see:
$ wget http://downloads.sourceforge.net/sourceforge/boost/boost_1_44_0.tar.gz

$ md5sum boost_1_44_0.tar.gz 
085fce4ff2089375105d72475d730e15  boost_1_44_0.tar.gz

$ python --version
Python 3.2.2

$ python2 --version
Python 2.7.2

$ python ./foo.py
>>> 8145

$ python2 ./foo.py 
>>> 33635

where foo.py is:
##
import tarfile
o = tarfile.open("boost_1_44_0.tar.gz")
print(">>> %s" % len(o.getmembers()))
o.close()
## EOF ##


is it a known bug ?

(this of course prevents TarFile.extractall to be useful w/ python3...)

-s
msg145447 - (view) Author: Sebastien Binet (bins) Date: 2011-10-13 08:28
one interesting additional piece of information is that if I un-tar that file and re-tar it w/o gzip compression, getmembers gets the right answer.

-s
msg145503 - (view) Author: Roundup Robot (python-dev) Date: 2011-10-14 10:54
New changeset 341008eab87d by Lars Gustäbel in branch '3.2':
Issue #13158: Fix decoding and encoding of base-256 number fields in tarfile.
http://hg.python.org/cpython/rev/341008eab87d

New changeset 158430b2b552 by Lars Gustäbel in branch 'default':
Merge with 3.2: Issue #13158: Fix decoding and encoding of base-256 number fields in tarfile.
http://hg.python.org/cpython/rev/158430b2b552
msg145504 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2011-10-14 10:58
Thanks for the report. There was a problem decoding a special and rare kind of header field in the archive. The format of the archive is of very bad quality BTW ;-)
msg145505 - (view) Author: Sebastien Binet (bins) Date: 2011-10-14 11:05
thanks!

> The format of the archive is of very bad quality BTW ;-)
well, that's C++ :P

-s
History
Date User Action Args
2011-10-14 11:05:30binssetmessages: + msg145505
2011-10-14 10:58:22lars.gustaebelsetstatus: open -> closed
resolution: fixed
messages: + msg145504

stage: resolved
2011-10-14 10:54:47python-devsetnosy: + python-dev
messages: + msg145503
2011-10-13 08:28:51binssetmessages: + msg145447
2011-10-13 08:21:33lars.gustaebelsetassignee: lars.gustaebel

nosy: + lars.gustaebel
versions: + Python 3.3
2011-10-12 12:54:32binscreate