Message 296807 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ncoghlan
Recipients	bapt, ezio.melotti, koobs, ncoghlan, ned.deily, vstinner
Date	2017-06-25.06:23:16
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1498371797.16.0.515908927386.issue30647@psf.upfronthosting.co.za>
In-reply-to

Content
Current status of the PR: - testing suggests that "nl_langinfo(CODESET)" fails with LC_CTYPE=UTF-8 on Mac OS X as well, but that doesn't matter for Python start-up, since we hardcode UTF-8 as the locale encoding and never call nl_langinfo - on Linux however, "nl_langingo(CODESET)" succeeds as expected Accordingly, I've revised the tests as follows: - on Linux and Mac OS X, having setlocale() succeed gets a locale added to the "available target locales" set for the tests. This reflects the fact that we skip the nl_langinfo(CODESET) check on Mac OS X, and expect it to always succeed on Linux if setlocale() succeeds - on other platforms where "locale.nl_langinfo(locale.CODESET)" is supported, we only consider a locale an available target locale if that call returns a non-empty answer At the locale coercion level, I've added an extra check where we save the initial locale (i.e. before we change anything), and if setlocale() succeeds, but nl_langinfo(CODESET) fails, we do setlocale(LC_CTYPE, initial_locale) to try to get things back to their original state. This seems to mostly work on FreeBSD, but doesn't quite get readline back to where it is by default, so test_non_ascii in test_readline fails with the error: ``` ====================================================================== FAIL: test_nonascii (test.test_readline.TestReadline) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/buildbot/python/custom.koobs-freebsd10/build/Lib/test/test_readline.py", line 203, in test_nonascii self.assertIn(b"text 't\\xeb'\r\n", output) AssertionError: b"text 't\\xeb'\r\n" not found in bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\\357nserted]\|t\x07\x08\x08\x08\x08\x08\x08\x08\x07\x07xrted]\|t\x08\x08\x08\x08\x08\x08\x08\x07\r\nresult \'[\\udcefnsexrted]\|t\'\r\nhistory \'[\\xefnsexrted]\|t\'\r\n") ``` My two current guesses as to what may be going wrong there are: * doing the equivalent of "setlocale(LC_CTYPE, setlocale(LC_CTYPE, NULL))" may be taking libc out of the weird initial state where it claims to be using ASCII, but is really using latin-1; or * setting "surrogateescape" on "stdin" is causing some unexpected behaviour in the affected test case I'm leaning towards the former, as if it was the latter, I'd expect to have already seen the same error without locale coercion.

Current status of the PR:

- testing suggests that "nl_langinfo(CODESET)" fails with LC_CTYPE=UTF-8 on Mac OS X as well, but that doesn't matter for Python start-up, since we hardcode UTF-8 as the locale encoding and never call nl_langinfo
- on Linux however, "nl_langingo(CODESET)" succeeds as expected

Accordingly, I've revised the tests as follows:

- on Linux and Mac OS X, having setlocale() succeed gets a locale added to the "available target locales" set for the tests. This reflects the fact that we skip the nl_langinfo(CODESET) check on Mac OS X, and expect it to always succeed on Linux if setlocale() succeeds
- on other platforms where "locale.nl_langinfo(locale.CODESET)" is supported, we only consider a locale an available target locale if that call returns a non-empty answer

At the locale coercion level, I've added an extra check where we save the initial locale (i.e. before we change anything), and if setlocale() succeeds, but nl_langinfo(CODESET) fails, we do setlocale(LC_CTYPE, initial_locale) to try to get things back to their original state.

This seems to *mostly* work on FreeBSD, but doesn't quite get readline back to where it is by default, so test_non_ascii in test_readline fails with the error:

```
======================================================================
FAIL: test_nonascii (test.test_readline.TestReadline)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/buildbot/python/custom.koobs-freebsd10/build/Lib/test/test_readline.py", line 203, in test_nonascii
    self.assertIn(b"text 't\\xeb'\r\n", output)
AssertionError: b"text 't\\xeb'\r\n" not found in bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\\357nserted]|t\x07\x08\x08\x08\x08\x08\x08\x08\x07\x07xrted]|t\x08\x08\x08\x08\x08\x08\x08\x07\r\nresult \'[\\udcefnsexrted]|t\'\r\nhistory \'[\\xefnsexrted]|t\'\r\n")

```

My two current guesses as to what may be going wrong there are:

* doing the equivalent of "setlocale(LC_CTYPE, setlocale(LC_CTYPE, NULL))" may be taking libc out of the weird initial state where it claims to be using ASCII, but is really using latin-1; or
* setting "surrogateescape" on "stdin" is causing some unexpected behaviour in the affected test case

I'm leaning towards the former, as if it was the latter, I'd expect to have already seen the same error *without* locale coercion.

History
Date	User	Action	Args
2017-06-25 06:23:17	ncoghlan	set	recipients: + ncoghlan, vstinner, ned.deily, ezio.melotti, koobs, bapt
2017-06-25 06:23:17	ncoghlan	set	messageid: <1498371797.16.0.515908927386.issue30647@psf.upfronthosting.co.za>
2017-06-25 06:23:17	ncoghlan	link	issue30647 messages
2017-06-25 06:23:16	ncoghlan	create