Message227340
The error handler is called "surrogateescape". That means "convert_surrogateescape" is always only a single step away from thinking "I want to remove the smuggled bytes from a surrogateescape'd string", without needing to assume any knowledge on the part of the user other than the name of the error handler and the fact that it is used to smuggle arbitrary bytes through the Python 3 str type.
Getting from "this string was decoded with the surrogateescape handler and may contain smuggled bytes" to "filter_non_utf8_data" as the relevant cleanup function is a much bigger leap that requires more assumed knowledge on the part of the user, and also one that confuses the conceptual purpose of the function (cleaning up the output of the surrogateescape error handler to ensure it is a pure Unicode string) with the internal details of the proposed approach to implementing that cleanup operation (encoding to UTF-8 with surrogateescape, and then decoding again with a different error handler). |
|
Date |
User |
Action |
Args |
2014-09-23 10:51:12 | ncoghlan | set | recipients:
+ ncoghlan, lemburg, pitrou, vstinner, ezio.melotti, Arfrever, r.david.murray, serhiy.storchaka |
2014-09-23 10:51:12 | ncoghlan | set | messageid: <1411469472.48.0.611257291181.issue18814@psf.upfronthosting.co.za> |
2014-09-23 10:51:12 | ncoghlan | link | issue18814 messages |
2014-09-23 10:51:12 | ncoghlan | create | |
|