Message378033
I succeeded to reproduce it on Ubuntu 20.04.
$ sudo vi /var/lib/locales/supported.d/ja # add "ja_JP.EUC-JP EUC-JP"
$ sudo locale-gen ja_JP.EUC-JP
Generating locales (this might take a while)...
ja_JP.EUC-JP... done
Generation complete.
$ chmod -r./build/lib.linux-x86_64-3.10/_sha3.cpython-310-x86_64-linux-gnu.so
$ LC_ALL=ja_JP.eucjp ./python
Python 3.10.0a0 (heads/master:fbf43f051e, Aug 17 2020, 15:13:52)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "")
'ja_JP.eucjp'
>>> import _sha3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 101: invalid start byte
Error message contains file path (byte string, probably encoded with fs encoding) and translated error message (encoded with locale encoding).
I want to use "backslashescape" error handler, but both of PyUnicode_DecodeLocale() and PyUnicode_DecodeFSDefault() don't support it.
After thinking about this several minutes, now I prefer PyUnicode_DecodeUTF8(msg, strlen(msg), "backslashreplace").
It fixes the issue with minimum behavior change, although error message is still backslashescaped.
It might be the best practice for creating Unicode object from C error message like strerror(3). |
|
Date |
User |
Action |
Args |
2020-10-05 14:30:27 | methane | set | recipients:
+ methane, kadler |
2020-10-05 14:30:27 | methane | set | messageid: <1601908227.53.0.894991336992.issue41894@roundup.psfhosted.org> |
2020-10-05 14:30:27 | methane | link | issue41894 messages |
2020-10-05 14:30:27 | methane | create | |
|