This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author r.david.murray
Recipients Laurent.Mazuel, r.david.murray, vstinner
Date 2014-01-21.15:33:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
If you live in a current-posix world, this might make sense.  However, one can also argue that the filename should be *transcoded* from the tarfile encoding to the local FS filename encoding, which I believe is what we are currently doing.  Which, if you are using POSIX as the locale, will fail a lot.  If you use a sensible modern locale that includes utf-8, you wouldn't have a problem.

Unfortunately, the reality is probably that sometimes you want one behavior and sometimes you want the other :(

Encoding using member.encoding is probably wrong, though.  If you are trying to preserve the original bytes, is is probably best do so, and not assume that the tarfile encoding field is valid.

I'm adding Victor Stinner to nosy: he's thought about these issues much more deeply than I have.  The answer may be that we will only support transcoding filenames in our tarfile module...and certainly it looks like doing anything else, even if we want to, would be a new feature.
Date User Action Args
2014-01-21 15:33:31r.david.murraysetrecipients: + r.david.murray, vstinner, Laurent.Mazuel
2014-01-21 15:33:31r.david.murraysetmessageid: <>
2014-01-21 15:33:31r.david.murraylinkissue20329 messages
2014-01-21 15:33:30r.david.murraycreate