This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_c_locale_coercion fails when the default LC_CTYPE != "C"
Type: behavior Stage: patch review
Components: Tests Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: erik.bray, ncoghlan, xdegaye
Priority: normal Keywords: patch

Created on 2017-11-10 10:47 by erik.bray, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL Status Linked Edit
PR 4361 open erik.bray, 2017-11-10 10:48
PR 4369 merged ncoghlan, 2017-11-11 06:35
Messages (11)
msg306019 - (view) Author: Erik Bray (erik.bray) * (Python triager) Date: 2017-11-10 10:47
Several of the tests in test_c_locale_coercion (particularly LocaleCoercionTests._check_c_locale_coercion) tend to assume that the system default locale used when setting setlocale(category, "") and when all the relevant environment variables are empty/blank will be the "C"/"POSIX" locale.

While this is often true POSIX does not require this to be the case.  For example on Cygwin it already defaults to "C.UTF-8", so these tests fail because they assume the legacy coercion will be used, when it isn't (e.g. the LC_CTYPE environment variable does not get forced to "C.UTF-8").  In principle this can affect any platform, however, that chooses a different default.
msg306022 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-11-10 10:59
Issue 30672 is potentially related here - some of the test cases are already disabled on Mac OS X and other *BSD systems since the tests assume that C & POSIX are aliases of each other.

I've also added Xavier to the nosy list, since the current implementation and tests aren't quite right for Android either and it would be good to come up with a unified solution to more robust platform feature detection: https://bugs.python.org/issue28180#msg305850
msg306023 - (view) Author: Erik Bray (erik.bray) * (Python triager) Date: 2017-11-10 11:16
Yes, I looked at some of the other issues pertaining to this, but it wasn't immediately apparent how to kill multiple birds with one stone, so here I just focused on this one assumption.
msg306027 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-11-10 12:49
OK, I'd been meaning to get back to refactoring those tests anyway, so assigning this to myself.

I'm thinking that the right way to go will be to give the test case a more explicit model of "expected platform behaviour" (initialised in setupModule), rather than having that be implicit in a bunch of conditionals scattered throughout the individual test cases.

Then we'd have at least the following cases:

- default is C, POSIX is an alias for C (most Linux distros)
- default is C, POSIX is a separate locale (*BSD)
- default is C.UTF-8 (Cygwin, potentially Android depending on exactly how we resolve that)
msg306036 - (view) Author: Erik Bray (erik.bray) * (Python triager) Date: 2017-11-10 15:24
In my PR there's a behavior test for the default, so we don't have to hard-code that on a per-platform basis at least.  The C != POSIX thing I'm not sure you can easily test for.
msg306078 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-11-11 06:41
The essential problem in both this issue and issue 30672 is that the tests are currently incorporating some Linux-specific assumptions about ways to request the "C" locale.

In https://github.com/python/cpython/pull/4369, I've taken the approach of making the baseline tests only cover "C" and "invalid.ascii", and then explicitly *opt-in* to testing an empty locale and "POSIX" on Linux machines.

If that's enough to get the test passing on Cygwin, I'm inclined to leave it at that. Dynamically calculated test expectations always make me nervous, since it's all too easy to end up with bugs that impact both the test case and the expectation calculator in the same way, and hence end up with the test passing when it should really fail.
msg306079 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-11-11 06:44
Note: I'm not entirely sold on my own argument though, as I believe at least Alpine Linux already interprets the empty locale as C.UTF-8, so it may make more sense to use your dynamic check with both the empty string and "POSIX", and only testing those locales if they get reported back as effectively configuring the "C" locale.
msg306083 - (view) Author: Xavier de Gaye (xdegaye) * (Python triager) Date: 2017-11-11 12:57
> Several of the tests in test_c_locale_coercion (particularly LocaleCoercionTests._check_c_locale_coercion) tend to assume that the system default locale used when setting setlocale(category, "") and when all the relevant environment variables are empty/blank will be the "C"/"POSIX" locale.
>
> While this is often true POSIX does not require this to be the case.

I think you are right. The section starting with "The values of locale categories shall be determined by a precedence order;" in [1] states:

4. If the LANG environment variable is not set or is set to the empty string, the implementation-defined default locale shall be used.

In the current implementation of PR 4334 [2] only one change to test_c_locale_coercion is needed to fix the failures of some subtests of test_PYTHONCOERCECLOCALE_set_to_warn when all the locale envt variables are set to the empty string. All the other tests are unchanged and ok because the new _Py_SetLocaleFromEnv() function [3] causes Android to behave as a plain *nix platform except when the locale envt variables are unset or set to an empty string.

[1] http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html
[2] PR 4334: Fix the implementation of PEP 538 on Android
[3] And because after calling setlocale(category, "C"), setlocale(category) returns "C" on Android (this may not be the case on Cygwin).
msg307785 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-12-07 00:47
For the POSIX case, we're going to fix the implementation to always handle that the same way as it does the "C" locale: https://bugs.python.org/issue30672#msg307784

So the main question to address with the refactoring here will be capturing the expected behaviour for the 'locale setting is an empty string' case.
msg308466 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-12-16 08:51
New changeset 9c19b020249c451891affd81751947321a1e6957 by Nick Coghlan in branch 'master':
bpo-32002: Refactor C locale coercion tests (GH-4369)
https://github.com/python/cpython/commit/9c19b020249c451891affd81751947321a1e6957
msg372506 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2020-06-28 07:34
Removing issue assignment, as I'm no longer actively investigating this.
History
Date User Action Args
2022-04-11 14:58:54adminsetgithub: 76183
2020-06-28 07:34:48ncoghlansetassignee: ncoghlan ->
messages: + msg372506
2017-12-16 08:51:23ncoghlansetmessages: + msg308466
2017-12-07 00:47:36ncoghlansetmessages: + msg307785
2017-12-07 00:39:09ncoghlanlinkissue32238 dependencies
2017-11-11 12:57:26xdegayesetmessages: + msg306083
2017-11-11 06:44:57ncoghlansetmessages: + msg306079
2017-11-11 06:41:20ncoghlansetmessages: + msg306078
2017-11-11 06:35:40ncoghlansetpull_requests: + pull_request4322
2017-11-10 15:24:25erik.braysetmessages: + msg306036
2017-11-10 12:49:37ncoghlansetassignee: ncoghlan
messages: + msg306027
2017-11-10 11:16:39erik.braysetmessages: + msg306023
2017-11-10 10:59:20ncoghlansetnosy: + xdegaye
messages: + msg306022
2017-11-10 10:48:45erik.braysetkeywords: + patch
stage: patch review
pull_requests: + pull_request4316
2017-11-10 10:47:52erik.braycreate