classification
Title: zipfile UnicodeDecodeError
Type: Stage:
Components: Unicode Versions: Python 2.4
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: adamx97, georg.brandl, lemburg, smigyull, terry.reedy
Priority: normal Keywords:

Created on 2005-03-25 02:51 by adamx97, last changed 2007-08-30 10:10 by georg.brandl. This issue is now closed.

Messages (6)
msg24778 - (view) Author: adam davis (adamx97) Date: 2005-03-25 02:51
I think this is the same as #  705295, which may have
been prematurely closed.

I think the error is dependent on the data or time.

File "C:\Python24\lib\zipfile.py", line 166, in FileHeader
    return header + self.filename + self.extra
UnicodeDecodeError: 'ascii' codec can't decode byte
0xd0 in position 10: ordinal not in range(128)

The header is packed like this:
        header = struct.pack(structFileHeader,
stringFileHeader,
                 self.extract_version, self.reserved,
self.flag_bits,
                 self.compress_type, dostime, dosdate, CRC,
                 compress_size, file_size,
                 len(self.filename), len(self.extra))

the header is:

[Dbg]>>> header
'PK\x03\x04\x14\x00\x00\x00\x00\x00\xd0\xa9x2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00-\x00\x00\x00'

and here are the parts that made it up:

[Dbg]>>> structFileHeader, stringFileHeader,
self.extract_version, self.reserved,
self.flag_bits,self.compress_type, dostime, dosdate,
CRC, compress_size, file_size, len(self.filename),
len(self.extra)
('<4s2B4HlLL2H', 'PK\x03\x04', 20, 0, 0, 0, 43472,
12920, 0, 0, 0, 45, 0)


here's the pieces of the 
msg24779 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2005-03-30 18:39
Logged In: YES 
user_id=593130

Your report ends with 'here's the pieces of the'.  Was something 
cut off?  If you meant to attach a file, try again.
msg24780 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2005-03-30 19:27
Logged In: YES 
user_id=38388

The problem is not the data in the file, but the fact that
your filename is probably a Unicode object which fails to
concatenate with the header (which clearly isn't ASCII :-).

I have no idea whether ZIP files support anything other than
ASCII filenames. If you have a reference, please let us know.

If your filename only contains ASCII characters, you should
be able to open the file correctly by first encoding the
filename to ASCII: filename.encode('ascii').
Perhaps zipfile.py should do that for you ?!
msg24781 - (view) Author: adam davis (adamx97) Date: 2005-03-31 03:51
Logged In: YES 
user_id=175166

The "here's the pieces of the" was an accident, you can
ignore it.

My filename was pure ascii, something like
"myemail@test.com.crt"

It seems to me the problem is that the header isn't
decodable.  0xd0 is  208, which is > 128.
msg24782 - (view) Author: Jean-Roch Roy (smigyull) Date: 2005-11-01 22:42
Logged In: YES 
user_id=1371642

Python 2.4/WinXPSP2

I'm able to reproduce this bug. Create an arbitrary file named 
"Test.bin" having a time stamp of 2005-11-01 16:15:32. Then 
run this code:

import zipfile
zipFile = zipfile.ZipFile("Test.zip", "w", zipfile.ZIP_DEFLATED)
zipFile.write(u"Test.bin")

You should see the aforementionned traceback. The problem 
occurs when 1. pack() returns a string with some characters > 
128 (depends on time stamp); and 2. write() is called with a 
unicode parameter (instead of a str parameter).
msg55470 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-08-30 10:10
The docs say (at least the development docs) that Unicode filenames must
be encoded to str before passing them to ZipFile.write().

(This issue will have to be solved differently for Py3k, I'll look into it.)
History
Date User Action Args
2007-08-30 10:10:12georg.brandlsetstatus: open -> closed
nosy: + georg.brandl
resolution: wont fix
messages: + msg55470
2005-03-25 02:51:55adamx97create