Message 378033 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	methane
Recipients	kadler, methane
Date	2020-10-05.14:30:27
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1601908227.53.0.894991336992.issue41894@roundup.psfhosted.org>
In-reply-to

Content
I succeeded to reproduce it on Ubuntu 20.04. $ sudo vi /var/lib/locales/supported.d/ja # add "ja_JP.EUC-JP EUC-JP" $ sudo locale-gen ja_JP.EUC-JP Generating locales (this might take a while)... ja_JP.EUC-JP... done Generation complete. $ chmod -r./build/lib.linux-x86_64-3.10/_sha3.cpython-310-x86_64-linux-gnu.so $ LC_ALL=ja_JP.eucjp ./python Python 3.10.0a0 (heads/master:fbf43f051e, Aug 17 2020, 15:13:52) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.setlocale(locale.LC_ALL, "") 'ja_JP.eucjp' >>> import _sha3 Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 101: invalid start byte Error message contains file path (byte string, probably encoded with fs encoding) and translated error message (encoded with locale encoding). I want to use "backslashescape" error handler, but both of PyUnicode_DecodeLocale() and PyUnicode_DecodeFSDefault() don't support it. After thinking about this several minutes, now I prefer PyUnicode_DecodeUTF8(msg, strlen(msg), "backslashreplace"). It fixes the issue with minimum behavior change, although error message is still backslashescaped. It might be the best practice for creating Unicode object from C error message like strerror(3).

I succeeded to reproduce it on Ubuntu 20.04.

    $ sudo vi /var/lib/locales/supported.d/ja # add "ja_JP.EUC-JP EUC-JP"
    $ sudo locale-gen ja_JP.EUC-JP
    Generating locales (this might take a while)...
    ja_JP.EUC-JP... done
    Generation complete.
    $ chmod -r./build/lib.linux-x86_64-3.10/_sha3.cpython-310-x86_64-linux-gnu.so
    $ LC_ALL=ja_JP.eucjp ./python
    Python 3.10.0a0 (heads/master:fbf43f051e, Aug 17 2020, 15:13:52)
    [GCC 9.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import locale
    >>> locale.setlocale(locale.LC_ALL, "")
    'ja_JP.eucjp'
    >>> import _sha3
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 101: invalid start byte

Error message contains file path (byte string, probably encoded with fs encoding) and translated error message (encoded with locale encoding).

I want to use "backslashescape" error handler, but both of PyUnicode_DecodeLocale() and PyUnicode_DecodeFSDefault() don't support it.

After thinking about this several minutes, now I prefer PyUnicode_DecodeUTF8(msg, strlen(msg), "backslashreplace").
It fixes the issue with minimum behavior change, although error message is still backslashescaped.
It might be the best practice for creating Unicode object from C error message like strerror(3).

History
Date	User	Action	Args
2020-10-05 14:30:27	methane	set	recipients: + methane, kadler
2020-10-05 14:30:27	methane	set	messageid: <1601908227.53.0.894991336992.issue41894@roundup.psfhosted.org>
2020-10-05 14:30:27	methane	link	issue41894 messages
2020-10-05 14:30:27	methane	create