This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients Arfrever, ezio.melotti, lemburg, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, vstinner
Date 2014-09-23.10:51:12
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1411469472.48.0.611257291181.issue18814@psf.upfronthosting.co.za>
In-reply-to
Content
The error handler is called "surrogateescape". That means "convert_surrogateescape" is always only a single step away from thinking "I want to remove the smuggled bytes from a surrogateescape'd string", without needing to assume any knowledge on the part of the user other than the name of the error handler and the fact that it is used to smuggle arbitrary bytes through the Python 3 str type.

Getting from "this string was decoded with the surrogateescape handler and may contain smuggled bytes" to "filter_non_utf8_data" as the relevant cleanup function is a much bigger leap that requires more assumed knowledge on the part of the user, and also one that confuses the conceptual purpose of the function (cleaning up the output of the surrogateescape error handler to ensure it is a pure Unicode string) with the internal details of the proposed approach to implementing that cleanup operation (encoding to UTF-8 with surrogateescape, and then decoding again with a different error handler).
History
Date User Action Args
2014-09-23 10:51:12ncoghlansetrecipients: + ncoghlan, lemburg, pitrou, vstinner, ezio.melotti, Arfrever, r.david.murray, serhiy.storchaka
2014-09-23 10:51:12ncoghlansetmessageid: <1411469472.48.0.611257291181.issue18814@psf.upfronthosting.co.za>
2014-09-23 10:51:12ncoghlanlinkissue18814 messages
2014-09-23 10:51:12ncoghlancreate