Message378223
> I think that it is more correct to use the locale encoding. If error messages are translated for readability, we should not ruin this by outputting \xXX.
* PyUnicode_DecodeLocale() doesn't support "backslashescape" error handler.
* Error message is usually encoded in locale encoding, but it is not guaranteed.
* Error message may contain path, it may be not locale encoding too.
* \xXX is far better than UnicodeDecodeError, anyway. We need to fix the UnicodeDecodeError first.
* non-UTF-8 locale is rare. We used this code for long time but we haven't reported this issue until now.
I don't against adding "backslashescape" to PyUnicode_DecodeLocale(). But to backport the bugfix for UnicodeDecodeError, change should be minimum.
So the main problem is: should we allow surrogateescape in error message?
For the record, PyUnicode_DecodeLocale() is using mbstowcs(). I don't know how reliable the function is in various platforms. That is why I had suggested PyUnicode_DecodeFSDefault() at first. |
|
Date |
User |
Action |
Args |
2020-10-08 08:34:22 | methane | set | recipients:
+ methane, serhiy.storchaka, kadler |
2020-10-08 08:34:22 | methane | set | messageid: <1602146062.83.0.831044137678.issue41894@roundup.psfhosted.org> |
2020-10-08 08:34:22 | methane | link | issue41894 messages |
2020-10-08 08:34:22 | methane | create | |
|