Message 150050 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	r.david.murray
Recipients	benjamin.peterson, gz, pitrou, poolie, r.david.murray, vila, vstinner
Date	2011-12-21.22:54:42
SpamBayes Score	8.884418e-09
Marked as misclassified	No
Message-id	<1324508083.56.0.40180991722.issue13643@psf.upfronthosting.co.za>
In-reply-to

Content
> But currently everything handling filenames as unicode on > nix needs to worry about surrogates (that can't be encoded > as ascii) already, or it will still be passing values that > can't be interpreted by other processes as you highlighed > earlier. Making utf-8 names come out correctly rather than > as surrogates doesn't seem like it increases the burden. And that is exactly the problem. You can't assume that those other programs are expecting utf-8 on unix. The only thing you have to go by is the locale. So that's what we use. And as Haypo pointed out, unless you manipulate it file system stuff gets turned back into the same bytes when it exits Python, so pre-existing stuff should work fine. Now, if posix (or a given unix platform, like OS X did) would say "utf-8 is the standard filesystem and program interchange encoding", we could change Python. Short of that, it is our experience that using anything other than locale leads to more problems than using locale does.

> But currently everything handling filenames as unicode on
> nix needs to worry about surrogates (that can't be encoded
> as ascii) already, or it will still be passing values that
> can't be interpreted by other processes as you highlighed
> earlier. Making utf-8 names come out correctly rather than
> as surrogates doesn't seem like it increases the burden.

And that is *exactly* the problem.  You can't assume that those other programs are expecting utf-8 on unix.  The only thing you have to go by is the locale.  So that's what we use.  And as Haypo pointed out, unless you manipulate it file system stuff gets turned back into the same bytes when it exits Python, so pre-existing stuff should work fine.

Now, if posix (or a given unix platform, like OS X did) would say "utf-8 is the standard filesystem and program interchange encoding", we could change Python.  Short of that, it is our experience that using anything other than locale leads to more problems than using locale does.

History
Date	User	Action	Args
2011-12-21 22:54:43	r.david.murray	set	recipients: + r.david.murray, pitrou, vstinner, vila, benjamin.peterson, gz, poolie
2011-12-21 22:54:43	r.david.murray	set	messageid: <1324508083.56.0.40180991722.issue13643@psf.upfronthosting.co.za>
2011-12-21 22:54:42	r.david.murray	link	issue13643 messages
2011-12-21 22:54:42	r.david.murray	create