This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients abadger1999, benjamin.peterson, ezio.melotti, lemburg, ncoghlan, pitrou, vstinner
Date 2013-08-12.15:19:33
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1376320774.07.0.703505392881.issue18713@psf.upfronthosting.co.za>
In-reply-to
Content
One problem with Unicode in 3.x is that surrogateescape isn't normally enabled on stdin and stdout. This means the following code will fail with UnicodeEncodeError in the presence of invalid filesystem metadata:

    print(os.listdir())

We don't really want to enable surrogateescape on sys.stdin or sys.stdout unilaterally, as it increases the chance of data corruption errors when the filesystem encoding and the IO encodings don't match.

Last night, Toshio and I thought of a possible solution: enable surrogateescape by default for sys.stdin and sys.stdout on non-Windows systems if (and only if) they're using the same codec as that returned by sys.getfilesystemencoding() (allowing for codec aliases rather than doing a simple string comparison)

This means that for full UTF-8 systems (which includes most modern Linux installations), roundtripping will be enabled by default between the standard streams and OS facing APIs, while systems where the encodings don't match will still fail noisily.

A more general alternative is also possible: default to errors='surrogatescape' for *any* text stream that uses the filesystem encoding. It's primarily the standard streams we're interested in fixing, though.
History
Date User Action Args
2013-08-12 15:19:34ncoghlansetrecipients: + ncoghlan, lemburg, pitrou, vstinner, abadger1999, benjamin.peterson, ezio.melotti
2013-08-12 15:19:34ncoghlansetmessageid: <1376320774.07.0.703505392881.issue18713@psf.upfronthosting.co.za>
2013-08-12 15:19:33ncoghlanlinkissue18713 messages
2013-08-12 15:19:33ncoghlancreate