This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lars.gustaebel
Recipients BreamoreBoy, ezio.melotti, hynek, lars.gustaebel, vinay.sajip
Date 2014-07-08.10:40:11
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1404816012.41.0.932157710788.issue17153@psf.upfronthosting.co.za>
In-reply-to
Content
IIRC, tarfile under 2.7 has never been explicitly unicode-safe, support for unicode objects is heterogeneous at best. The obvious work-around is to work exclusively with str objects.

What we can't do is to decode the utf-8 pathname from the archive to a unicode object, because we have no way to detect an archive's encoding. We can either emit a warning if the user passes a unicode object to extract() or we implicitly encode the passed unicode object using TarFile.encoding, so that the os.path.join() succeeds.

Unfortunately, I am not entirely sure if there was possibly a rationale behind the current behaviour of extract(). This needs more inspection.
History
Date User Action Args
2014-07-08 10:40:12lars.gustaebelsetrecipients: + lars.gustaebel, vinay.sajip, ezio.melotti, BreamoreBoy, hynek
2014-07-08 10:40:12lars.gustaebelsetmessageid: <1404816012.41.0.932157710788.issue17153@psf.upfronthosting.co.za>
2014-07-08 10:40:12lars.gustaebellinkissue17153 messages
2014-07-08 10:40:11lars.gustaebelcreate