classification
Title: Handle "POSIX" in the legacy locale detection
Type: behavior Stage: test needed
Components: FreeBSD, Interpreter Core, macOS, Unicode Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: 32002 Superseder: PEP 538: Unexpected locale behaviour on *BSD (including Mac OS X)
View: 30672
Assigned To: Nosy List: ezio.melotti, jwilk, koobs, ncoghlan, ned.deily, ronaldoussoren, vstinner
Priority: normal Keywords:

Created on 2017-12-07 00:28 by ncoghlan, last changed 2019-10-11 21:32 by vstinner.

Messages (4)
msg307781 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-12-07 00:28
Right now, the legacy locale detection introduced in PEP 538 doesn't trigger for "LANG=POSIX" and "LC_CTYPE=POSIX" on macOS and other *BSD systems.

This is because we're looking specifically for "C" as the response from "setlocale(LC_CTYPE, NULL)", which works on Linux (where glibc reports "C" if you configured "POSIX"), but not on *BSD systems (where POSIX and C behave the same way, but are still reported as distinct locales).

As per Jakub Wilk's comments at https://mail.python.org/pipermail/python-dev/2017-December/151105.html, this isn't right: we should allow either string to be returned from setlocale, and consider both of them as indicating a legacy locale to be coerced to an explicitly UTF-8 based one if possible.
msg307782 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-12-07 00:39
Added a dependency on https://bugs.python.org/issue32002, as we should finish the test case refactoring proposed there before adjusting the `POSIX` locale handling on macOS and other *BSD systems.
msg307783 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-12-07 00:42
Oops, I forgot I already had an open issue for this discrepancy - I just hadn't decided how to resolve it yet.

Marking as a duplicate of https://bugs.python.org/issue30672
msg354502 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-11 21:32
In Python 3.8, if the LC_CTYPE is "POSIX", the default stdio error handler is now "surrogateescape" instead of "strict", and the UTF-8 is now enabled. In short, LC_CTYPE="POSIX" now behaves as LC_CTYPE="C".

This change impacts at least FreeBSD. If I correctly, if there is no LC_ALL, LC_CTYPE or LANG environment variable on FreeBSD, the LC_CTYPE locale is "POSIX".

See bpo-34485, bpo-19977 and the "POSIX locale on FreeBSD" section of my article:
https://vstinner.github.io/python3-locales-encodings.html
History
Date User Action Args
2019-10-11 21:32:55vstinnersetmessages: + msg354502
2017-12-07 11:58:32jwilksetnosy: + jwilk
2017-12-07 00:42:10ncoghlansetsuperseder: PEP 538: Unexpected locale behaviour on *BSD (including Mac OS X)
messages: + msg307783
2017-12-07 00:39:09ncoghlansetdependencies: + test_c_locale_coercion fails when the default LC_CTYPE != "C"
messages: + msg307782
2017-12-07 00:28:35ncoghlancreate