This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients ezio.melotti, lemburg, loewis, ncoghlan, r.david.murray, rsc1975, serhiy.storchaka, vstinner
Date 2015-09-01.00:05:15
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1441065915.56.0.452897619697.issue24968@psf.upfronthosting.co.za>
In-reply-to
Content
Looking again at the *specific* bug report here, I'm moving the resolution to "out of date", as it's actually the one we addressed in 3.5 by enabling surrogateescape by default on all of the standard streams when the OS claims the locale encoding is ASCII, not just stderr: http://bugs.python.org/issue19977

That allows us to at least correctly roundtrip data, even if the OS has given has bad encoding settings.

The problem with forcing UTF-8 more generally when the OS claims ASCII is that it may be the wrong thing to do and result in data corruption, especially on systems using East Asian codecs. Querying /etc/locale.conf [1] instead of relying on the nominal glibc locale settings should reliably give us correct encoding/locale information on modern Linux systems in cases like this one, where SSH has forwarded mismatched locale settings from a client system to a server shell session.

Another issue with relevant background discussion is issue #23993, which speculated on extending the "default to surrogateescape" idea to all open() calls when glibc claims the locale encoding is ASCII.

[1] http://www.freedesktop.org/software/systemd/man/locale.conf.html
History
Date User Action Args
2015-09-01 00:05:15ncoghlansetrecipients: + ncoghlan, lemburg, loewis, vstinner, ezio.melotti, r.david.murray, serhiy.storchaka, rsc1975
2015-09-01 00:05:15ncoghlansetmessageid: <1441065915.56.0.452897619697.issue24968@psf.upfronthosting.co.za>
2015-09-01 00:05:15ncoghlanlinkissue24968 messages
2015-09-01 00:05:15ncoghlancreate