Author methane
Recipients kadler, methane, serhiy.storchaka
Date 2020-10-08.08:34:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1602146062.83.0.831044137678.issue41894@roundup.psfhosted.org>
In-reply-to
Content
> I think that it is more correct to use the locale encoding. If error messages are translated for readability, we should not ruin this by outputting \xXX.

* PyUnicode_DecodeLocale() doesn't support "backslashescape" error handler.
* Error message is usually encoded in locale encoding, but it is not guaranteed.
* Error message may contain path, it may be not locale encoding too.
* \xXX is far better than UnicodeDecodeError, anyway. We need to fix the UnicodeDecodeError first.
* non-UTF-8 locale is rare. We used this code for long time but we haven't reported this issue until now.

I don't against adding "backslashescape" to PyUnicode_DecodeLocale(). But to backport the bugfix for UnicodeDecodeError, change should be minimum.

So the main problem is: should we allow surrogateescape in error message?

For the record, PyUnicode_DecodeLocale() is using mbstowcs(). I don't know how reliable the function is in various platforms. That is why I had suggested PyUnicode_DecodeFSDefault() at first.
History
Date User Action Args
2020-10-08 08:34:22methanesetrecipients: + methane, serhiy.storchaka, kadler
2020-10-08 08:34:22methanesetmessageid: <1602146062.83.0.831044137678.issue41894@roundup.psfhosted.org>
2020-10-08 08:34:22methanelinkissue41894 messages
2020-10-08 08:34:22methanecreate