classification
Title: Solaris: Fix broken Unicode encoding in non-UTF locales
Type: Stage: resolved
Components: Unicode Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, kulikjak, miss-islington, pablogsal, vstinner
Priority: normal Keywords: patch

Created on 2021-03-30 10:11 by kulikjak, last changed 2021-06-20 20:12 by pablogsal. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 25096 merged kulikjak, 2021-03-30 10:12
PR 25847 merged kulikjak, 2021-05-03 11:37
PR 26405 merged kulikjak, 2021-05-27 15:40
PR 26409 merged miss-islington, 2021-05-27 17:08
PR 26410 merged miss-islington, 2021-05-27 17:08
PR 26498 merged miss-islington, 2021-06-02 23:47
Messages (13)
msg389813 - (view) Author: Jakub Kulik (kulikjak) * Date: 2021-03-30 10:11
On Linux, wchar_t values are mapped to their UTF-8 counterparts; however, that does not have to be the case as the standard allows any arbitrary representation to be used, and this is the case for Solaris.

In Oracle Solaris, the internal form of wchar_t is specific to a locale; in the Unicode locales, wchar_t has the UTF-32 Unicode encoding form, and other locales have different representations [1].

This is an issue because Python expects wchar_t to correspond with Unicode, which on Oracle Solaris with non-UTF locale results either in errors (values are outside the Unicode range) or in output with different symbols.

Unicode locales work as expected, but they are not an acceptable workaround for some Oracle Solaris users that cannot use Unicode encoding for various reasons.


Because of that, we fixed it a few months ago with a patch to `PyUnicode_FromWideChar`, which handles conversion to unicode (attached in PR). It was tested over the last half a year, and we didn't see any related issues since.

Is something like this acceptable or should it be fixed on a different place/in a different way? All comments are appreciated.

[1] https://docs.oracle.com/cd/E36784_01/html/E39536/gmwkm.html
msg389814 - (view) Author: Jakub Kulik (kulikjak) * Date: 2021-03-30 10:12
I forgot to mention: this affects Oracle Solaris. I tested this on SmartOS, and I cannot reproduce it there as it seems that they are using Unicode representation for all locales. Based on the documentation, this might also affect other systems as well (e.g. HP UIX specifically says: 'These values may not be compatible with values obtained by specifying other locales that are supported'), but it's hard to tell without testing that.

This one liner breaks with ValueError: character U+30000069 is not in range [U+0000; U+10ffff] if the issue is present:
python3.7 -c 'import datetime; import locale; locale.setlocale(locale.LC_ALL,"es_ES.ISO8859-1"); datetime.date(2001, 1, 3).strftime("%a")'
msg392429 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-30 13:21
New changeset 9032cf5cb1e33c0349089cfb0f6bf11ed3c30e86 by Jakub Kulík in branch 'master':
bpo-43667: Fix broken Unicode encoding in non-UTF locales on Solaris (GH-25096)
https://github.com/python/cpython/commit/9032cf5cb1e33c0349089cfb0f6bf11ed3c30e86
msg394116 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-05-21 14:59
New changeset d3cc68900dc99966007112f884779895daefc7db by Jakub Kulík in branch '3.9':
[3.9] bpo-43667: Fix broken Unicode encoding in non-UTF locales on Solaris (GH-25096) (GH-25847)
https://github.com/python/cpython/commit/d3cc68900dc99966007112f884779895daefc7db
msg394117 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-05-21 15:00
Backport to 3.8 may be more complicated. It's up to you to decide if you want to backport it or not. I merged your 3.9 backport, it looks very close to the change made in the main branch.
msg394305 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-05-25 09:51
Do you want to attempt to backport the fix to 3.8, or can this issue be closed?
msg394308 - (view) Author: Jakub Kulik (kulikjak) * Date: 2021-05-25 09:59
Sorry for delayed response.

Considering that we are not delivering or using 3.8 in any way and this issue doesn't seem to impact anybody else, we can omit the backport to 3.8. I will prepare another PR with a news fragment, and after that, this can be considered solved and closed.
msg394309 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-05-25 10:02
I close the issue, but you can still reference the bpo issue number for your PR with the changelog (NEWS) entry.
msg394572 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-05-27 17:08
New changeset 164a4f46d1606e21d82babc010e397a9116e6730 by Jakub Kulík in branch 'main':
bpo-43667: Add news fragment for Solaris changes (GH-26405)
https://github.com/python/cpython/commit/164a4f46d1606e21d82babc010e397a9116e6730
msg394576 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-05-27 17:23
New changeset 0574b0686d76e6f9199f800b5f32bd56eaff3c77 by Miss Islington (bot) in branch '3.10':
bpo-43667: Add news fragment for Solaris changes (GH-26405) (GH-26409)
https://github.com/python/cpython/commit/0574b0686d76e6f9199f800b5f32bd56eaff3c77
msg394577 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-05-27 17:23
New changeset 427232f9d221d54870fa3e89bd1dac55cf42243f by Miss Islington (bot) in branch '3.9':
bpo-43667: Add news fragment for Solaris changes (GH-26405) (GH-26410)
https://github.com/python/cpython/commit/427232f9d221d54870fa3e89bd1dac55cf42243f
msg394578 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-05-27 17:24
I merged your PR and backported it to add a NEWS entry, thanks.
msg396193 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-20 20:12
New changeset f87d2038fadd9c067d50fb2f1d7c2f37b9f3893a by Miss Islington (bot) in branch '3.10':
bpo-43667: Add news fragment for Solaris changes (GH-26405) (GH-26498)
https://github.com/python/cpython/commit/f87d2038fadd9c067d50fb2f1d7c2f37b9f3893a
History
Date User Action Args
2021-06-20 20:12:16pablogsalsetnosy: + pablogsal
messages: + msg396193
2021-06-02 23:47:47miss-islingtonsetpull_requests: + pull_request25094
2021-05-27 17:24:17vstinnersetmessages: + msg394578
2021-05-27 17:23:55vstinnersetmessages: + msg394577
2021-05-27 17:23:50vstinnersetmessages: + msg394576
2021-05-27 17:08:53miss-islingtonsetpull_requests: + pull_request25004
2021-05-27 17:08:42miss-islingtonsetnosy: + miss-islington

pull_requests: + pull_request25003
2021-05-27 17:08:23vstinnersetmessages: + msg394572
2021-05-27 15:40:07kulikjaksetpull_requests: + pull_request24998
2021-05-25 10:02:40vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg394309

stage: patch review -> resolved
2021-05-25 09:59:08kulikjaksetmessages: + msg394308
versions: + Python 3.11, - Python 3.8
2021-05-25 09:51:59vstinnersetmessages: + msg394305
2021-05-21 15:00:39vstinnersetmessages: + msg394117
2021-05-21 14:59:46vstinnersetmessages: + msg394116
2021-05-03 14:46:29kulikjaksetcomponents: + Unicode, - Tests
versions: + Python 3.8, Python 3.9, Python 3.10, - Python 3.11
2021-05-03 12:28:54sujalpatel67821setcomponents: + Tests, - Unicode
versions: + Python 3.11, - Python 3.7, Python 3.8, Python 3.9, Python 3.10
2021-05-03 11:37:13kulikjaksetpull_requests: + pull_request24530
2021-04-30 13:21:48vstinnersetmessages: + msg392429
2021-03-30 10:12:51kulikjaksetmessages: + msg389814
2021-03-30 10:12:23kulikjaksetkeywords: + patch
stage: patch review
pull_requests: + pull_request23840
2021-03-30 10:11:34kulikjakcreate