This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients docs@python, serhiy.storchaka, swamiyeswanth, vstinner, xuanji
Date 2012-10-25.16:17:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1351181850.16.0.15208403656.issue11160@psf.upfronthosting.co.za>
In-reply-to
Content
The ZIP specification says:

"""
If general purpose bit 11 is unset, the file name and comment should conform 
to the original ZIP character encoding.  If general purpose bit 11 is set, the 
filename and comment must support The Unicode Standard, Version 4.1.0 or 
greater using the character encoding form defined by the UTF-8 storage 
specification.  The Unicode Standard is published by the The Unicode
Consortium (www.unicode.org).  UTF-8 encoded data stored within ZIP files 
is expected to not include a byte order mark (BOM). 
"""

Also there is extension for UTF-8 encoded file comment.  All this means the file comment should be interpreted as an unicode string.

However the specification says nothing about .ZIP file comment (except that encryption or data authentication is applied to it).

Since changeset 4186f20d9fa4 ZipFile.comment raises TypeError on try to assign non-bytes. I think the documentation should be clarified.
History
Date User Action Args
2012-10-25 16:17:30serhiy.storchakasetrecipients: + serhiy.storchaka, vstinner, docs@python, xuanji, swamiyeswanth
2012-10-25 16:17:30serhiy.storchakasetmessageid: <1351181850.16.0.15208403656.issue11160@psf.upfronthosting.co.za>
2012-10-25 16:17:30serhiy.storchakalinkissue11160 messages
2012-10-25 16:17:29serhiy.storchakacreate