Issue 43667: Solaris: Fix broken Unicode encoding in non-UTF locales

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/87833

classification

Title:	Solaris: Fix broken Unicode encoding in non-UTF locales
Type:		Stage:	resolved
Components:	Unicode	Versions:	Python 3.11, Python 3.10, Python 3.9

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	ezio.melotti, kulikjak, miss-islington, pablogsal, vstinner
Priority:	normal	Keywords:	patch

Created on 2021-03-30 10:11 by kulikjak, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 25096	merged	kulikjak, 2021-03-30 10:12
PR 25847	merged	kulikjak, 2021-05-03 11:37
PR 26405	merged	kulikjak, 2021-05-27 15:40
PR 26409	merged	miss-islington, 2021-05-27 17:08
PR 26410	merged	miss-islington, 2021-05-27 17:08
PR 26498	merged	miss-islington, 2021-06-02 23:47

Messages (13)
msg389813 - (view)	Author: Jakub Kulik (kulikjak) *	Date: 2021-03-30 10:11
On Linux, wchar_t values are mapped to their UTF-8 counterparts; however, that does not have to be the case as the standard allows any arbitrary representation to be used, and this is the case for Solaris. In Oracle Solaris, the internal form of wchar_t is specific to a locale; in the Unicode locales, wchar_t has the UTF-32 Unicode encoding form, and other locales have different representations [1]. This is an issue because Python expects wchar_t to correspond with Unicode, which on Oracle Solaris with non-UTF locale results either in errors (values are outside the Unicode range) or in output with different symbols. Unicode locales work as expected, but they are not an acceptable workaround for some Oracle Solaris users that cannot use Unicode encoding for various reasons. Because of that, we fixed it a few months ago with a patch to `PyUnicode_FromWideChar`, which handles conversion to unicode (attached in PR). It was tested over the last half a year, and we didn't see any related issues since. Is something like this acceptable or should it be fixed on a different place/in a different way? All comments are appreciated. [1] https://docs.oracle.com/cd/E36784_01/html/E39536/gmwkm.html
msg389814 - (view)	Author: Jakub Kulik (kulikjak) *	Date: 2021-03-30 10:12
I forgot to mention: this affects Oracle Solaris. I tested this on SmartOS, and I cannot reproduce it there as it seems that they are using Unicode representation for all locales. Based on the documentation, this might also affect other systems as well (e.g. HP UIX specifically says: 'These values may not be compatible with values obtained by specifying other locales that are supported'), but it's hard to tell without testing that. This one liner breaks with ValueError: character U+30000069 is not in range [U+0000; U+10ffff] if the issue is present: python3.7 -c 'import datetime; import locale; locale.setlocale(locale.LC_ALL,"es_ES.ISO8859-1"); datetime.date(2001, 1, 3).strftime("%a")'
msg392429 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-04-30 13:21
New changeset 9032cf5cb1e33c0349089cfb0f6bf11ed3c30e86 by Jakub Kulík in branch 'master': bpo-43667: Fix broken Unicode encoding in non-UTF locales on Solaris (GH-25096) https://github.com/python/cpython/commit/9032cf5cb1e33c0349089cfb0f6bf11ed3c30e86
msg394116 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-05-21 14:59
New changeset d3cc68900dc99966007112f884779895daefc7db by Jakub Kulík in branch '3.9': [3.9] bpo-43667: Fix broken Unicode encoding in non-UTF locales on Solaris (GH-25096) (GH-25847) https://github.com/python/cpython/commit/d3cc68900dc99966007112f884779895daefc7db
msg394117 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-05-21 15:00
Backport to 3.8 may be more complicated. It's up to you to decide if you want to backport it or not. I merged your 3.9 backport, it looks very close to the change made in the main branch.
msg394305 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-05-25 09:51
Do you want to attempt to backport the fix to 3.8, or can this issue be closed?
msg394308 - (view)	Author: Jakub Kulik (kulikjak) *	Date: 2021-05-25 09:59
Sorry for delayed response. Considering that we are not delivering or using 3.8 in any way and this issue doesn't seem to impact anybody else, we can omit the backport to 3.8. I will prepare another PR with a news fragment, and after that, this can be considered solved and closed.
msg394309 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-05-25 10:02
I close the issue, but you can still reference the bpo issue number for your PR with the changelog (NEWS) entry.
msg394572 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-05-27 17:08
New changeset 164a4f46d1606e21d82babc010e397a9116e6730 by Jakub Kulík in branch 'main': bpo-43667: Add news fragment for Solaris changes (GH-26405) https://github.com/python/cpython/commit/164a4f46d1606e21d82babc010e397a9116e6730
msg394576 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-05-27 17:23
New changeset 0574b0686d76e6f9199f800b5f32bd56eaff3c77 by Miss Islington (bot) in branch '3.10': bpo-43667: Add news fragment for Solaris changes (GH-26405) (GH-26409) https://github.com/python/cpython/commit/0574b0686d76e6f9199f800b5f32bd56eaff3c77
msg394577 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-05-27 17:23
New changeset 427232f9d221d54870fa3e89bd1dac55cf42243f by Miss Islington (bot) in branch '3.9': bpo-43667: Add news fragment for Solaris changes (GH-26405) (GH-26410) https://github.com/python/cpython/commit/427232f9d221d54870fa3e89bd1dac55cf42243f
msg394578 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-05-27 17:24
I merged your PR and backported it to add a NEWS entry, thanks.
msg396193 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2021-06-20 20:12
New changeset f87d2038fadd9c067d50fb2f1d7c2f37b9f3893a by Miss Islington (bot) in branch '3.10': bpo-43667: Add news fragment for Solaris changes (GH-26405) (GH-26498) https://github.com/python/cpython/commit/f87d2038fadd9c067d50fb2f1d7c2f37b9f3893a

History
Date	User	Action	Args
2022-04-11 14:59:43	admin	set	github: 87833
2021-06-20 20:12:16	pablogsal	set	nosy: + pablogsal messages: + msg396193
2021-06-02 23:47:47	miss-islington	set	pull_requests: + pull_request25094
2021-05-27 17:24:17	vstinner	set	messages: + msg394578
2021-05-27 17:23:55	vstinner	set	messages: + msg394577
2021-05-27 17:23:50	vstinner	set	messages: + msg394576
2021-05-27 17:08:53	miss-islington	set	pull_requests: + pull_request25004
2021-05-27 17:08:42	miss-islington	set	nosy: + miss-islington pull_requests: + pull_request25003
2021-05-27 17:08:23	vstinner	set	messages: + msg394572
2021-05-27 15:40:07	kulikjak	set	pull_requests: + pull_request24998
2021-05-25 10:02:40	vstinner	set	status: open -> closed resolution: fixed messages: + msg394309 stage: patch review -> resolved
2021-05-25 09:59:08	kulikjak	set	messages: + msg394308 versions: + Python 3.11, - Python 3.8
2021-05-25 09:51:59	vstinner	set	messages: + msg394305
2021-05-21 15:00:39	vstinner	set	messages: + msg394117
2021-05-21 14:59:46	vstinner	set	messages: + msg394116
2021-05-03 14:46:29	kulikjak	set	components: + Unicode, - Tests versions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.11
2021-05-03 12:28:54	sujalpatel67821	set	components: + Tests, - Unicode versions: + Python 3.11, - Python 3.7, Python 3.8, Python 3.9, Python 3.10
2021-05-03 11:37:13	kulikjak	set	pull_requests: + pull_request24530
2021-04-30 13:21:48	vstinner	set	messages: + msg392429
2021-03-30 10:12:51	kulikjak	set	messages: + msg389814
2021-03-30 10:12:23	kulikjak	set	keywords: + patch stage: patch review pull_requests: + pull_request23840
2021-03-30 10:11:34	kulikjak	create