Message 150024 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jaraco
Recipients	RonnyPfannschmidt, alexis, eric.araujo, jaraco, jens, lars.gustaebel, mikehoy, mu_mind, tarek, vstinner
Date	2011-12-21.17:35:47
SpamBayes Score	5.625406e-07
Marked as misclassified	No
Message-id	<1324488948.91.0.745330117699.issue11638@psf.upfronthosting.co.za>
In-reply-to

Content
> > Encoding to 'utf-8' or the default file system encoding doesn't seem > > right (as the characters end up getting stored in the gzip archive itself). > I don’t understand. The characters are being stored in the gzip archive as part of the gzip header. The comment in the Python 3 trunk indicates the encoding should be iso-8859-1: https://bitbucket.org/mirror/cpython/src/f3041e7f535d/Lib/tarfile.py#cl-475 My point is that the file system encoding is not relevant here. Because the name is being stored in a gzip blob, it should be encoded according to gzip specs. > > Additionally, encoding as 'utf-8' would cause the file to be created > > with a utf-8 filename, which would be undesirable. > Why? My concern here was that if we're encoding the string as utf-8 before passing to the __builtins__.open() call, Python might encode _that_ utf-8 string using the file system encoding and save the file that way (where the file is named with a utf-8 encoded string, not the unicode string intended). After further investigation, and based on the work that's been proposed, this is not a risk.

> > Encoding to 'utf-8' or the default file system encoding doesn't seem
> > right (as the characters end up getting stored in the gzip archive itself).
> I don’t understand.

The characters are being stored in the gzip archive as part of the gzip header. The comment in the Python 3 trunk indicates the encoding should be iso-8859-1: https://bitbucket.org/mirror/cpython/src/f3041e7f535d/Lib/tarfile.py#cl-475

My point is that the file system encoding is not relevant here. Because the name is being stored in a gzip blob, it should be encoded according to gzip specs.

> > Additionally, encoding as 'utf-8' would cause the file to be created
> > with a utf-8 filename, which would be undesirable.
> Why?

My concern here was that if we're encoding the string as utf-8 before passing to the __builtins__.open() call, Python might encode _that_ utf-8 string using the file system encoding and save the file that way (where the file is named with a utf-8 encoded string, not the unicode string intended). After further investigation, and based on the work that's been proposed, this is not a risk.

History
Date	User	Action	Args
2011-12-21 17:35:48	jaraco	set	recipients: + jaraco, lars.gustaebel, vstinner, tarek, eric.araujo, RonnyPfannschmidt, alexis, mu_mind, mikehoy, jens
2011-12-21 17:35:48	jaraco	set	messageid: <1324488948.91.0.745330117699.issue11638@psf.upfronthosting.co.za>
2011-12-21 17:35:48	jaraco	link	issue11638 messages
2011-12-21 17:35:47	jaraco	create