Message 389813 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	kulikjak
Recipients	ezio.melotti, kulikjak, vstinner
Date	2021-03-30.10:11:33
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1617099094.43.0.768783655635.issue43667@roundup.psfhosted.org>
In-reply-to

Content
On Linux, wchar_t values are mapped to their UTF-8 counterparts; however, that does not have to be the case as the standard allows any arbitrary representation to be used, and this is the case for Solaris. In Oracle Solaris, the internal form of wchar_t is specific to a locale; in the Unicode locales, wchar_t has the UTF-32 Unicode encoding form, and other locales have different representations [1]. This is an issue because Python expects wchar_t to correspond with Unicode, which on Oracle Solaris with non-UTF locale results either in errors (values are outside the Unicode range) or in output with different symbols. Unicode locales work as expected, but they are not an acceptable workaround for some Oracle Solaris users that cannot use Unicode encoding for various reasons. Because of that, we fixed it a few months ago with a patch to `PyUnicode_FromWideChar`, which handles conversion to unicode (attached in PR). It was tested over the last half a year, and we didn't see any related issues since. Is something like this acceptable or should it be fixed on a different place/in a different way? All comments are appreciated. [1] https://docs.oracle.com/cd/E36784_01/html/E39536/gmwkm.html

On Linux, wchar_t values are mapped to their UTF-8 counterparts; however, that does not have to be the case as the standard allows any arbitrary representation to be used, and this is the case for Solaris.

In Oracle Solaris, the internal form of wchar_t is specific to a locale; in the Unicode locales, wchar_t has the UTF-32 Unicode encoding form, and other locales have different representations [1].

This is an issue because Python expects wchar_t to correspond with Unicode, which on Oracle Solaris with non-UTF locale results either in errors (values are outside the Unicode range) or in output with different symbols.

Unicode locales work as expected, but they are not an acceptable workaround for some Oracle Solaris users that cannot use Unicode encoding for various reasons.


Because of that, we fixed it a few months ago with a patch to `PyUnicode_FromWideChar`, which handles conversion to unicode (attached in PR). It was tested over the last half a year, and we didn't see any related issues since.

Is something like this acceptable or should it be fixed on a different place/in a different way? All comments are appreciated.

[1] https://docs.oracle.com/cd/E36784_01/html/E39536/gmwkm.html

History
Date	User	Action	Args
2021-03-30 10:11:34	kulikjak	set	recipients: + kulikjak, vstinner, ezio.melotti
2021-03-30 10:11:34	kulikjak	set	messageid: <1617099094.43.0.768783655635.issue43667@roundup.psfhosted.org>
2021-03-30 10:11:34	kulikjak	link	issue43667 messages
2021-03-30 10:11:33	kulikjak	create