Issue7011
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009-09-28 09:21 by do3cc, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (2) | |||
---|---|---|---|
msg93198 - (view) | Author: Patrick Gerken (do3cc) | Date: 2009-09-28 09:21 | |
Sadly, I am unable to debug it enough to be able to provide a thorough test case. I can provide information of how to reproduce the problem on request. I have a tar file and a diff to tarfile.py with some pdbs that only get activated in the middle of the file just before the problematic data. Installing an egg fails, and setuptools eats the original error. The original error is this: ValueError: 'invalid literal for int(): \xcf\xcf\xdf\xfc\xe9\xcd\xa9\xa9' That happens in the call to next in the class TarFile. Here we read in a chunk of filedata, and let TarInfo parse it. But the chunk of data is actually the beginning of an image in the tar file. Here is a more thorough report of my pdb findings: Environment: I created an egg on linux, which resulted in a tar.gz file. Installing that egg fails, because the tarfile library has problems reading the tar file. tar itself can extract the full file without problems. I have a self compiled python 2.4.6. The last file that is apparently read correctly form TarFile.next, is a LONGLINK, tarinfo.type == 'L' This type has a method callback in TarInfo.TYPE_METH, which it uses for returning the real TarInfo object. That goes into proc_gnulong of tarfile.py. This proc_gnulong method calls next again, to get the real file info, I think. The next buffer that is read out, contains a file name that is exactly 100chars long, and seems to be a directory, because it has a trailing slash. but its filetype is '0'. I suspected the error here, and to cross check, I checked the output of "tar -tf" on the tar file. I expect tar to return the filenames in the same order as python reads them in. Before the directory that next seems to find, there is his parent directory in there. The previous tarinfo object is exactly about this parent directory. So it looks like, we actually have a directory entry here. Enough wild guesses and more observations: The next call of TarInfo.next() creates a TarInfo object again, here at about line 693, he checks if the file is a regular file but ends with a slash. If so, he changes the file type from '0', regular file, to '5', DIRTYPE. He actually does that with our TarInfo object. The TarInfo object is created successfully and the next method continues to run. Then, around line 1650, there is a check, if tarinfo.isreg() or tarinfo.type not in SUPPORTED_TYPES:... Here the offset for reading the next TarInfo Buffer is increased by the size of the actual file size in the tar file. But not for our TarInfo object, because it is not regular file any longer. If I pad the offset manually, everything continues to work. But I won't do it this time. The call to next finishes, and after a while TarInfo.next() is called again. This time, next tries to read a chunk of data again, but this time, the chunk of data is an actual file content, it starts with 'GIF89a...', which makes sense, the directory contains images. Here parsing of the tar file fails. |
|||
msg93201 - (view) | Author: Patrick Gerken (do3cc) | Date: 2009-09-28 09:49 | |
doh, I only searched for open bugs. Not for closed. This ticket is a dublicate of http://bugs.python.org/issue1471427 and fixed in python 2.5. If somebody has similar problems, here is a quickfix: I finally was able to reproduce the issue. It only happens when the path without the filename but the trailing slash is exactly 100 chars long. Then, because of the trailing slash, tarfile makes this thing a directory, and if the file itself was not empty, the next read cannot be parsed as a tar file. Since I am bound to 2.4 I will rename the directories. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:53 | admin | set | github: 51260 |
2009-09-28 09:49:18 | do3cc | set | status: open -> closed messages: + msg93201 |
2009-09-28 09:21:22 | do3cc | create |