This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author a.badger
Recipients a.badger, abadger1999, benjamin.peterson, ezio.melotti, lemburg, ncoghlan, pitrou, r.david.murray, vstinner
Date 2013-08-21.00:16:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1377044191.58.0.292561564195.issue18713@psf.upfronthosting.co.za>
In-reply-to
Content
Nick and I had talked about this at a recent conference and came to it from different directions.  On the one hand, Nick made the point that any encoding of surrogateescape'd text to bytes via a different encoding is corrupting the data as a whole.  On the other hand, I made the point that raising an exception when doing something as basic as printing something that's text type was reintroducing the issues that python2 had wrt unicode, bytes, and encodings -- particularly with the exception being raised far from the source of the problem (when the data is introduced into the program).

After some thought, Nick came up with this solution.  The idea is that surrogateescape was originally accepted to allow roundtripping data from the OS and back when the OS considers it to be a "string" but python does not consider it to be "text".  When that's the case, we know what the encoding was used to attempt to construct the text in python.  If that same encoding is used to re-encode the data on the way back to the OS, then we're successfully roundtripping the data we were given in the first place.  So this is just applying the original goal to another API.
History
Date User Action Args
2013-08-21 00:16:31a.badgersetrecipients: + a.badger, lemburg, ncoghlan, pitrou, vstinner, abadger1999, benjamin.peterson, ezio.melotti, r.david.murray
2013-08-21 00:16:31a.badgersetmessageid: <1377044191.58.0.292561564195.issue18713@psf.upfronthosting.co.za>
2013-08-21 00:16:31a.badgerlinkissue18713 messages
2013-08-21 00:16:29a.badgercreate