This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients ezio.melotti, lars.gustaebel, pbienst
Date 2010-01-16.17:59:04
SpamBayes Score 2.1685342e-07
Marked as misclassified No
Message-id <1263664747.56.0.969112893373.issue7693@psf.upfronthosting.co.za>
In-reply-to
Content
Lars, I think the situation can still be improved. If tarfile works with bytes strings it should accept only bytes strings or unicode strings that can be encoded in ASCII, and encode them as soon as it gets them.
In the problem reported by Peter, he was passing u"." that is a unicode ASCII-only string. Later in the program this string gets mixed with a byte string and this causes an implicit decoding, i.e. it turns the byte strings to unicode (and possibly fails if the filename is non-ASCII). Even if the decoding succeeds, eventually tarfile will have to convert the unicode string to a byte string again.

A better approach would be to encode using the ASCII codec all the unicode strings that are passed.
If the unicode strings are ASCII-only (like the u"." Peter was passing), they can be encoded without problems. When they get mixed with other strings they are all bytes strings so no implicit decoding happens.
If the unicode strings are non-ASCII, the encoding will fail immediately and warn the user that he will have to encode the unicode string before passing it to the function.
History
Date User Action Args
2010-01-16 17:59:08ezio.melottisetrecipients: + ezio.melotti, pbienst, lars.gustaebel
2010-01-16 17:59:07ezio.melottisetmessageid: <1263664747.56.0.969112893373.issue7693@psf.upfronthosting.co.za>
2010-01-16 17:59:04ezio.melottilinkissue7693 messages
2010-01-16 17:59:04ezio.melotticreate