Title: tarfile.TarFile.getmembers misses some entries
Messages (5)
msg145393 - (view) Author: Sebastien Binet (bins) Date: 2011-10-12 12:54
hi there,

it seems tarfile in python 3.2.2 (as installed in archlinux, but I don't see any additional patch applied on top of the vanilla sources:
) has troubles giving the complete content of a tar ball.

$ wget

$ md5sum boost_1_44_0.tar.gz 
085fce4ff2089375105d72475d730e15  boost_1_44_0.tar.gz

$ python --version
Python 3.2.2

$ python2 --version
Python 2.7.2

$ python ./
>>> 8145

$ python2 ./ 
>>> 33635

where is:
import tarfile
o ="boost_1_44_0.tar.gz")
print(">>> %s" % len(o.getmembers()))
## EOF ##

is it a known bug ?

(this of course prevents TarFile.extractall to be useful w/ python3...)

msg145447 - (view) Author: Sebastien Binet (bins) Date: 2011-10-13 08:28
one interesting additional piece of information is that if I un-tar that file and re-tar it w/o gzip compression, getmembers gets the right answer.

msg145503 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-14 10:54
New changeset 341008eab87d by Lars Gustäbel in branch '3.2':
Issue #13158: Fix decoding and encoding of base-256 number fields in tarfile.

New changeset 158430b2b552 by Lars Gustäbel in branch 'default':
Merge with 3.2: Issue #13158: Fix decoding and encoding of base-256 number fields in tarfile.
msg145504 - (view) Author: Lars Gustäbel (lars.gustaebel) * (Python committer) Date: 2011-10-14 10:58
Thanks for the report. There was a problem decoding a special and rare kind of header field in the archive. The format of the archive is of very bad quality BTW ;-)
msg145505 - (view) Author: Sebastien Binet (bins) Date: 2011-10-14 11:05

> The format of the archive is of very bad quality BTW ;-)
well, that's C++ :P

