Issue 30647: CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/74832

classification

Title:	CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538
Type:	behavior	Stage:	resolved
Components:	Unicode	Versions:	Python 3.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	ncoghlan	Nosy List:	bapt, ezio.melotti, koobs, ncoghlan, ned.deily, vstinner
Priority:	normal	Keywords:

Created on 2017-06-13 09:01 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 2374	merged	ncoghlan, 2017-06-24 08:27

Messages (9)
msg295870 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-06-13 09:01
Regression caused by the commit 6ea4186de32d65b1f1dc1533b6312b798d300466, bpo-28180: Implementation for PEP 538. http://buildbot.python.org/all/builders/AMD64%20FreeBSD%2010.x%20Shared%203.x/builds/412/steps/compile/logs/stdio Python detected LC_CTYPE=C: LC_CTYPE coerced to UTF-8 (set another locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behavior). Fatal Python error: Py_Initialize: Unable to get the locale encoding ValueError: CODESET is not set or empty Current thread 0x0000000802006400 (most recent call first): Abort trap (core dumped)
msg295873 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-06-13 09:10
On my FreeBSD 11 VM, I only have the "C" locale, not "UTF-8 C" locale: [haypo@freebsd ~/prog/python/master]$ locale -a\|grep ^C C But CPython still asks me to use a non existent locale (newlines added for readability): [haypo@freebsd ~/prog/python/master]$ ./python Python runtime initialized with LC_CTYPE=C (a locale with default ASCII encoding), which may cause Unicode compatibility problems. Using C.UTF-8, C.utf8, or UTF-8 (if available) as alternative Unicode-compatible locales is recommended. Python 3.7.0a0 (heads/master:d79c1d4a94, Jun 13 2017, 10:59:23) [GCC 4.2.1 Compatible FreeBSD Clang 3.8.0 (tags/RELEASE_380/final 262564)] on freebsd11 Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.setlocale(locale.LC_CTYPE, None) 'C'
msg295925 - (view)	Author: bapt (bapt)	Date: 2017-06-13 14:56
Per POSIX, the C locale is only expected to be ASCII. C.UTF-8 is a linux only thing (actually I thought it was a debian only thing, but maybe not). I was thinking about creating a C.utf8 locale on FreeBSD but it is not that simple to do (still doable and an interesting idea). Note that if it fails here, it is probably due also failing on other OS. At minimum: Dragonfly and Illumos for sure, maybe NetBSD and OpenBSD as well. haypo, do not hesitate to ping me on irc as usual if you want to discuss the issue.
msg295926 - (view)	Author: Ned Deily (ned.deily) *	Date: 2017-06-13 15:06
macOS is also BSD-like with regard to locales: it also does not have any C.* locales other than plain C. See, for example, the discussion at bpo-18378.
msg295928 - (view)	Author: bapt (bapt)	Date: 2017-06-13 15:17
More details here: C.UTF-8 is a glibc only thing: https://sourceware.org/glibc/wiki/Proposals/C.UTF-8 not even mainstream. The closest thing to a C locale with unicode would be to set everything to locale C but LC_CTYPE which would be set to en_US.UTF-8. The problem is if your data for ctype comes from CLDR they are different per locales. On FreeBSD, Dragonfly and Illumos, we have extected it so LC_CTYPE is the same on all locales.
msg296079 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2017-06-15 10:29
Note that the coercion logic includes a runtime check to see if 'setlocale(LC_CTYPE, "<locale_name>")' succeeds. That's how we skip over the non-existent C.UTF-8 and C.utf8 to get to "LC_CTYPE=UTF-8" on Mac OS X and FreeBSD. That appears to work (and really does work on Mac OS X as far as CPython's test suite is concerned), but on FreeBSD we subsequently get the CODESET failure when we try to call `nl_langinfo` later in the interpreter startup process. Victor's suggestion, which seems reasonable to me, is that we could also add the `nl_langinfo` call in the coercion logic, so that we never implicitly configure a locale setting that breaks nl_langinfo. That way, instead of the interpreter failing to start, we'd just skip the locale coercion logic in that case (and update the test suite's expectations accordingly).
msg296807 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2017-06-25 06:23
Current status of the PR: - testing suggests that "nl_langinfo(CODESET)" fails with LC_CTYPE=UTF-8 on Mac OS X as well, but that doesn't matter for Python start-up, since we hardcode UTF-8 as the locale encoding and never call nl_langinfo - on Linux however, "nl_langingo(CODESET)" succeeds as expected Accordingly, I've revised the tests as follows: - on Linux and Mac OS X, having setlocale() succeed gets a locale added to the "available target locales" set for the tests. This reflects the fact that we skip the nl_langinfo(CODESET) check on Mac OS X, and expect it to always succeed on Linux if setlocale() succeeds - on other platforms where "locale.nl_langinfo(locale.CODESET)" is supported, we only consider a locale an available target locale if that call returns a non-empty answer At the locale coercion level, I've added an extra check where we save the initial locale (i.e. before we change anything), and if setlocale() succeeds, but nl_langinfo(CODESET) fails, we do setlocale(LC_CTYPE, initial_locale) to try to get things back to their original state. This seems to mostly work on FreeBSD, but doesn't quite get readline back to where it is by default, so test_non_ascii in test_readline fails with the error: ``` ====================================================================== FAIL: test_nonascii (test.test_readline.TestReadline) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/buildbot/python/custom.koobs-freebsd10/build/Lib/test/test_readline.py", line 203, in test_nonascii self.assertIn(b"text 't\\xeb'\r\n", output) AssertionError: b"text 't\\xeb'\r\n" not found in bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\\357nserted]\|t\x07\x08\x08\x08\x08\x08\x08\x08\x07\x07xrted]\|t\x08\x08\x08\x08\x08\x08\x08\x07\r\nresult \'[\\udcefnsexrted]\|t\'\r\nhistory \'[\\xefnsexrted]\|t\'\r\n") ``` My two current guesses as to what may be going wrong there are: * doing the equivalent of "setlocale(LC_CTYPE, setlocale(LC_CTYPE, NULL))" may be taking libc out of the weird initial state where it claims to be using ASCII, but is really using latin-1; or * setting "surrogateescape" on "stdin" is causing some unexpected behaviour in the affected test case I'm leaning towards the former, as if it was the latter, I'd expect to have already seen the same error without locale coercion.
msg297272 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2017-06-29 14:48
New changeset 18974c35ad9d25ffea041dc0363dc01889f4a595 by Nick Coghlan in branch 'master': bpo-30647: Check nl_langinfo(CODESET) in locale coercion (GH-2374) https://github.com/python/cpython/commit/18974c35ad9d25ffea041dc0363dc01889f4a595
msg297273 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2017-06-29 14:52
I was able to fix the test_readline failure by restoring the locale based on the environment settings with `setlocale(LC_CTYPE, "")` rather than the return value from a preceding call to `setlocale(LC_CTYPE, NULL)`. That means we can leave the runtime coercion checks enabled on *BSD systems, and if/when any given BSD variant adds working Linux-style C.UTF-8 or OS-X-style UTF-8 locales, we'll automatically start using them.

History
Date	User	Action	Args
2022-04-11 14:58:47	admin	set	github: 74832
2017-06-29 14:52:41	ncoghlan	set	status: open -> closed type: behavior messages: + msg297273 resolution: fixed stage: resolved
2017-06-29 14:48:17	ncoghlan	set	messages: + msg297272
2017-06-25 06:23:17	ncoghlan	set	messages: + msg296807
2017-06-24 08:27:28	ncoghlan	set	pull_requests: + pull_request2421
2017-06-18 04:09:14	ncoghlan	set	assignee: ncoghlan
2017-06-18 01:26:26	ncoghlan	link	issue30672 dependencies
2017-06-15 10:29:26	ncoghlan	set	nosy: + ncoghlan messages: + msg296079
2017-06-13 15:17:29	bapt	set	messages: + msg295928
2017-06-13 15:06:26	ned.deily	set	nosy: + ned.deily messages: + msg295926
2017-06-13 14:56:24	bapt	set	nosy: + bapt messages: + msg295925
2017-06-13 12:54:59	ncoghlan	link	issue28180 dependencies
2017-06-13 09:10:31	vstinner	set	nosy: + koobs
2017-06-13 09:10:09	vstinner	set	messages: + msg295873
2017-06-13 09:01:31	vstinner	create