Author r.david.murray
Recipients Laurent.Mazuel, r.david.murray, vstinner
Date 2014-01-21.15:33:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1390318411.18.0.548313596381.issue20329@psf.upfronthosting.co.za>
In-reply-to
Content
If you live in a current-posix world, this might make sense.  However, one can also argue that the filename should be *transcoded* from the tarfile encoding to the local FS filename encoding, which I believe is what we are currently doing.  Which, if you are using POSIX as the locale, will fail a lot.  If you use a sensible modern locale that includes utf-8, you wouldn't have a problem.

Unfortunately, the reality is probably that sometimes you want one behavior and sometimes you want the other :(

Encoding using member.encoding is probably wrong, though.  If you are trying to preserve the original bytes, is is probably best do so, and not assume that the tarfile encoding field is valid.

I'm adding Victor Stinner to nosy: he's thought about these issues much more deeply than I have.  The answer may be that we will only support transcoding filenames in our tarfile module...and certainly it looks like doing anything else, even if we want to, would be a new feature.
History
Date User Action Args
2014-01-21 15:33:31r.david.murraysetrecipients: + r.david.murray, vstinner, Laurent.Mazuel
2014-01-21 15:33:31r.david.murraysetmessageid: <1390318411.18.0.548313596381.issue20329@psf.upfronthosting.co.za>
2014-01-21 15:33:31r.david.murraylinkissue20329 messages
2014-01-21 15:33:30r.david.murraycreate