classification
Title: [Windows] test_locale.TestMiscellaneous.test_getsetlocale_issue1813() fails
Type: behavior Stage: resolved
Components: Library (Lib), Windows Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: tim.golden Nosy List: db3l, eryksun, guy.linton, jkloth, lukasz.langa, methane, miss-islington, ned.deily, paul.moore, serhiy.storchaka, steve.dower, terry.reedy, tim.golden, vstinner, xtreak, zach.ware
Priority: Keywords: patch

Created on 2019-08-25 17:33 by tim.golden, last changed 2021-03-31 22:47 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
pythoninfo.txt tim.golden, 2019-08-25 17:33 python -mtest.pythoninfo
Pull Requests
URL Status Linked Edit
PR 25110 merged vstinner, 2021-03-31 09:54
PR 25112 merged miss-islington, 2021-03-31 11:02
PR 25113 merged miss-islington, 2021-03-31 11:02
Messages (38)
msg350466 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2019-08-25 17:33
On a Win10 machine I'm consistently seeing test_locale (and test__locale) fail. I'll attach pythoninfo.

======================================================================
ERROR: test_getsetlocale_issue1813 (test.test_locale.TestMiscellaneous)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\tim\work-in-progress\cpython\lib\test\test_locale.py", line 531, in test_getsetlocale_issue1813
    locale.setlocale(locale.LC_CTYPE, loc)
  File "C:\Users\tim\work-in-progress\cpython\lib\locale.py", line 604, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting
msg350470 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2019-08-25 19:29
Ok; so basically this doesn't work:

<code>
import locale
locale.setlocale(locale.LC_CTYPE, locale.getdefaultlocale())
</code>

It gives "locale.Error: unsupported locale setting" which comes from https://github.com/python/cpython/blob/master/Modules/_localemodule.c#L107

(For locale.getdefaultlocale() you could substitute locale.getlocale() or simply ("en_GB", "cp1252")). On my machine it raises that exception on Python 2.7.15, 3.6.6 and on master. 

Interestingly, none of the other tests in test_locale appear to exercise the 2-tuple 2nd param to setlocale. When you call setlocale and it returns the previous setting, it's a single string, eg "en_GB" etc. Passing that back in works. But when you call getlocale, it returns the 2-tuple, eg ("en_GB", "cp1252"). But all the other tests use the setlocale-returns-current trick for their setup/teardown.

I've quickly tested on 3.5 on Linux and the 2-tuple version works ok. I assume it's working on buildbots or we'd see the Turkish test failing every time. So is there something different about my C runtime, I wonder?
msg350471 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2019-08-25 19:34
Just to save you looking, the code in https://github.com/python/cpython/blob/master/Modules/_localemodule.c#L107 converts the 2-tuple to lang.encoding form so the C module is seeing "en_GB.cp1252"
msg350485 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-08-26 05:11
local.normalize is generally wrong in Windows. It's meant for POSIX systems. Currently "tr_TR" is parsed as follows:

    >>> locale._parse_localename('tr_TR')
    ('tr_TR', 'ISO8859-9')

The encoding "ISO8859-9" is meaningless to Windows. Also, the old CRT only ever supported either full language/country names or non-standard abbreviations -- e.g. either "Turkish_Turkey" or "trk_TUR". Having locale.getdefaultlocale() return ISO two-letter codes (e.g. "en_GB") was fundamentally wrong for the old CRT. (2.7 will die with this wart.)

3.5+ uses the Universal CRT, which does support standard ISO codes, but only in BCP 47 [1] locale names of the following form:

    language           ISO 639
    ["-" script]       ISO 15924
    ["-" region]       ISO 3166-1

BCP 47 locale names have been preferred by Windows for the past 13 years, since Vista was released. Windows extends BCP 47 with a non-standard sort-order field (e.g. "de-Latn-DE_phoneb" is the German language with Latin script in the region of Germany with phone-book sort order). Another departure from strict BCP 47 in Windows is allowing underscore to be used as the delimiter instead of hyphen. 

In a concession to existing C code, the Universal CRT also supports an encoding suffix in BCP 47 locales, but this can only be either ".utf-8" or ".utf8". (Windows itself does not support specifying an encoding in a locale name, but it's Unicode anyway.) No other encoding is allowed. If ".utf-8" isn't specified, a BCP 47 locale defaults to the locale's ANSI codepage. However, there's no way to convey this in the locale name itself. Also, if a locale is Unicode only (e.g. Hindi), the CRT implicitly uses UTF-8 even without the ".utf-8" suffix.

The following are valid BCP 47 locale names in the CRT: "tr", "tr.utf-8", "tr-TR", "tr_TR", "tr_TR.utf8", or "tr-Latn-TR.utf-8". But note that "tr_TR.1254" is not supported.

The following shows that omitting the optional "utf-8" encoding in a BCP 47 locale makes the CRT default to the associated ANSI codepage. 

    >>> locale.setlocale(locale.LC_CTYPE, 'tr_TR')
    'tr_TR'
    >>> ucrt.___lc_codepage_func()
    1254

C ___lc_codepage_func() queries the codepage of the current locale. We can directly query this codepage for a BCP 47 locale via GetLocaleInfoEx:

    >>> cpstr = (ctypes.c_wchar * 6)()
    >>> kernel32.GetLocaleInfoEx('tr-TR',
    ...     LOCALE_IDEFAULTANSICODEPAGE, cpstr, len(cpstr))
    5
    >>> cpstr.value
    '1254'

If the result is '0', it's a Unicode-only locale (e.g. 'hi-IN' -- Hindi, India). Recent versions of the CRT use UTF-8 (codepage 65001) for Unicode-only locales:

    >>> locale.setlocale(locale.LC_CTYPE, 'hi-IN')
    'hi-IN'
    >>> ucrt.___lc_codepage_func()
    65001

Here are some example locale tuples that should be supported, given that the CRT continues to support full English locale names and non-standard abbreviations, in addition to the new BCP 47 names:

    ('tr', None)
    ('tr_TR', None)
    ('tr_Latn_TR, None)
    ('tr_TR', 'utf-8')
    
    ('trk_TUR', '1254')
    ('Turkish_Turkey', '1254')

The return value from C setlocale can be normalized to replace hyphen delimiters with underscores, and "utf8" can be normalized as "utf-8". If it's a BCP 47 locale that has no encoding, GetLocaleInfoEx can be called to query the ANSI codepage. UTF-8 can be assumed if it's a Unicode-only locale. 

As to prefixing a codepage with 'cp', we don't really need to do this. We have aliases defined for most, such as '1252' -> 'cp1252'. But if the 'cp' prefix does get added, then the locale module should at least know to remove it when building a locale name from a tuple.

[1] https://tools.ietf.org/rfc/bcp/bcp47.txt
msg350491 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2019-08-26 06:52
Thanks, Eryk. Your explanation is as clear as always. But my question is, then: why is my machine failing this test [the only one which uses this two-part locale] and not the buildbots or (presumably) any other Windows developer?
msg350510 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-08-26 08:16
> But my question is, then: why is my machine failing this test [the 
> only one which uses this two-part locale] and not the buildbots or 
> (presumably) any other Windows developer?

test_getsetlocale_issue1813 fails for me as well. I can't imagine how setlocale(LC_CTYPE, "tr_TR.ISO8859-9") would succeed with recent versions of the Universal CRT in Windows. It parses "tr_TR" as a BCP 47 locale name, which only supports UTF-8 (e.g. "tr_TR.utf-8") and implicit ANSI (e.g. "tr_TR"). Plus "ISO8859-9" in general isn't a supported encoding of the form ".<codepage>", ".ACP" (ANSI), ".utf8", or ".utf-8". 

With the old CRT (2.x and <=3.4) and older versions of the Universal CRT, the initial locale.setlocale(locale.LC_CTYPE 'tr_TR') call fails as an unsupported locale, so the test is skipped: 

    test_getsetlocale_issue1813 (__main__.TestMiscellaneous) ... skipped 'test needs Turkish locale'

The old CRT only supports "trk_TUR", "trk_Turkey", "turkish_TUR", and "turkish_Turkey".
msg350548 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-08-26 16:33
So is the fix here to update locale._build_localename to check something like this?

if encoding is None:
    return language
elif sys.platform == 'win32' and encoding not in {'utf8', 'utf-8'}:
    return language
else:
    return language + '.' + encoding
msg350549 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2019-08-26 17:19
I agree that that could be a fix. And certainly, if it turns out that this could never have (recently) worked as Eryk is suggesting, then let's go for it.

But I still have this uneasy feeling that it's not failing on the buildbots and I can't see any sign of a skipped test in the test stdio. I just wonder whether there's something else at play here.
msg350559 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-08-26 18:26
I pushed a custom buildbot run that only runs this test in verbose mode, and it looks like the test is being skipped some other way?

https://buildbot.python.org/all/#/builders/48/builds/36
https://buildbot.python.org/all/#/builders/42/builds/54

I don't see any evidence there that it's running at all, though I do on my own machine.

Perhaps one of the other buildbot settings causes it to run in a different order and something skips the entire class? I haven't dug in enough to figure that out yet.
msg350568 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-08-26 20:51
We get into trouble with test_getsetlocale_issue1813 because normalize() maps "tr_TR" (supported) to "tr_TR.ISO8859-9" (not supported).

    >>> locale.normalize('tr_TR')
    'tr_TR.ISO8859-9'

We should skip normalize() in Windows. It's based on a POSIX locale_alias mapping that can only cause problems. The work for normalizing locale names in Windows is best handled inline in _build_localename and _parse_localename.

For the old long form, C setlocale always returns the codepage encoding (e.g. "Turkish_Turkey.1254") or "utf8", so that's simple to parse. For BCP 47 locales, the encoding is either "utf8" or "utf-8", or nothing at all. For the latter, there's an implied legacy ANSI encoding. This is used by the CRT wherever we depend on byte strings, such as in time.strftime:

mojibake:

    >>> locale.setlocale(locale.LC_CTYPE, 'en_GB')
    'en_GB'
    >>> time.strftime("\u0100")
    'A'

correct:

    >>> locale.setlocale(locale.LC_CTYPE, 'en_GB.utf-8')
    'en_GB.utf-8'
    >>> time.strftime("\u0100")
    'Ā'

(We should switch back to using wcsftime if possible.)

The implicit BCP-47 case can be parsed as `None` -- e.g. ("tr_TR", None). However, it might be useful to support getting the ANSI codepage via GetLocaleInfoEx [1]. A high-level function in locale could internally call _locale.getlocaleinfo(locale_name, LOCALE_IDEFAULTANSICODEPAGE). This would return a string such as "1254". or "0" for a Unicode-only language. 

For _build_localename, we can't simply limit the encoding to UTF-8. We need to support the old long/abbreviated forms (e.g. "trk_TUR", "turkish_Turkey") in addition to the newer BCP 47 locale names. In the old form we have to support the following encodings:

    * codepage encodings, with an optional "cp" prefix that has 
      to be stripped, e.g. ("trk_TUR", "cp1254") -> "trk_TUR.1254"
    * "ACP" in upper case only -- for the ANSI codepage of the 
      language
    * "utf8" (mixed case) and "utf-8" (mixed case)

(The CRT documentation says "OEM" should also be supported, but it's not.)

A locale name can also omit the language in the old form -- e.g. (None, "ACP") or (None, "cp1254"). The CRT uses the current language in this case. This is discouraged because the result may be nonsense.

[1] https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-getlocaleinfoex
msg350569 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-08-26 21:23
Oh yeah, that locale_alias table is useless on Windows :(

But at least the function is documented in such a way that we can change it: "The returned locale code is formatted for use with :func:`setlocale`."

Alternatively, we could make setlocale() do its own normalization step on Windows and ignore (or otherwise validate/reject) the encoding.

None of that explains why the test doesn't seem to run at all on the buildbots though.
msg350571 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-08-26 21:43
> None of that explains why the test doesn't seem to run at all on the 
> buildbots though.

Are the buildbots using an older version of UCRT? BCP 47 locales used to strictly require a hyphen as the delimiter (e.g. 'tr-TR') instead of underscore (e.g. 'tr_TR'). Supporting underscore and UTF-8 are relatively recent additions that aren't documented yet. Even WINAPI GetLocaleInfoEx supports underscore as the delimiter now, which is also undocumented behavior.
msg350573 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-08-26 21:46
> test_getsetlocale_issue1813 (test.test_locale.TestMiscellaneous) ... skipped 'test needs Turkish locale'

Yeah, looks like they're failing that part of the test. I'll run them again with the hyphen.
msg350574 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-08-26 21:51
Oh man, this is too broken for me to think about today...

If someone feels like writing a Windows-specific normalize() function to totally replace the Unix one, feel free, but it looks like we won't be able to get away with anything less. The "easy" change breaks a variety of other tests.
msg350598 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2019-08-27 04:59
This feels like one of those changes where what's in place is clearly flawed but any change seems like it'll break stuff which people have had in place for years.

I'll try to look at a least-breaking change but I'm honestly not sure what that would look like.
msg350820 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-08-29 19:47
Here's some additional background information for work on this issue.

A Unix locale identifier has the following form:

    "language[_territory][.codeset][@modifier]"
        | "POSIX"
        | "C"
        | ""
        | NULL

(X/Open Portability Guide, Issue 4, 1992 -- aka XPG4)

Some systems also implement "C.UTF-8". 

The language and territory should use ISO 639 and ISO 3166 alpha-2 codes. The "@" modifier may indicate an alternate script such as "sr_RS@latin" or an alternate currency such as "de_DE@euro". For the optional codeset, IANA publishes the following table of character sets:

http://www.iana.org/assignments/character-sets/character-sets.xhtml

In Debian Linux, the available encodings are defined by mapping files in "/usr/share/i18n/charmaps". But encodings can't be arbitrarily used in locales at run time. A locale has to be generated (see "/etc/locale.gen") before it's available. 

A Windows (not ucrt) locale name has the following form:

    "ISO639Language[-ISO15924Script][-ISO3166Region][SubTag][_SortOrder]"
        | ""                      | LOCALE_NAME_INVARIANT
        | "!x-sys-default-locale" | LOCALE_NAME_SYSTEM_DEFAULT
        | NULL                    | LOCALE_NAME_USER_DEFAULT

The invariant locale provides stable data. The system and user default locales vary according to the Control Panel "Region" settings.

A locale name is based on BCP 47 language tags, with the form "<language>-<script>-<region>"(e.g. "en-Latn-GB"), for which the script and region codes are optional. The language is an ISO 639 alpha-2 or alpha-3 code, with alpha-2 preferred. The script is an initial-uppercase ISO 15924 code. The region is an ISO 3166-1 alpha-2 or numeric-3 code, with alpha-2 preferred. 

As specified, the sort-order code should be delimited by an underscore, but Windows 10 (maybe older versions also?) accepts a hyphen instead. Here's a list of the sort-order codes that I've seen:

    * mathan - Math Alphanumerics       ( x-IV_mathan)
    * phoneb - Phone Book               (de-DE_phoneb)
    * modern - Modern                   (ka-GE_modern)
    * tradnl - Traditional              (es-ES_tradnl)
    * technl - Technical                (hu-HU_technl)
    * radstr - Radical/Stroke           (ja-JP_radstr)
    * stroke - Stroke Count             (zh-CN_stroke)
    * pronun - Pronunciation (Bopomofo) (zh-TW_pronun)

One final note of interest about Windows locales is that the user-interface language has been functionally isolated from the locale. The display language is handled by the Multilinugual User Interface (MUI) API, which depends on .mui files in locale-named subdirectories of a binary, such as "kernel32.dll" -> "en-US\kernel32.dll.mui". Windows 10 has an option to configure the user locale to match the preferred display language. This helps to keep the two in sync, but they're still functionally independent.

The Universal CRT (ucrt) in Windows supports the following syntax for a locale identifier:

    "ISO639Language[-ISO15924Script][-ISO3166Region][.utf8|.utf-8]"
        | "ISO639Language[-ISO15924Script][-ISO3166Region][SubTag][_SortOrder]"
        | "language[_region][.codepage|.utf8|.utf-8]"
        | ".codepage" | ".utf8" | ".utf-8"
        | "C"
        | ""
        | NULL

NULL is used with setlocale to query the current value of a category. The empty string "" is the current-user locale. "C" is a minimal locale. For LC_CTYPE, "C" uses Latin-1, but for LC_TIME it uses the system ANSI codepage (possibly multi-byte), which can lead to mojibake. The "POSIX" locale is not supported, nor is "C.UTF-8". 

Note that UTF-8 support is relatively new, as is the ability to set the encoding without also specifying a region (e.g. "english.utf8").

Recent versions of ucrt extend BCP-47 support in a couple of ways. Underscore is allowed in addition to hyphen as the tag delimiter (e.g "en_GB" instead of "en-GB"), and specifying UTF-8 as the encoding (and only UTF-8) is supported. If UTF-8 isn't specified, internally the locale defaults to the language's ANSI codepage. ucrt has to parse BCP 47 locales manually if they include an encoding, and also in some cases when underscore is used. Currently this fails to handle a sort-order tag, so we can't use, for example, "de_DE_phoneb.utf8".
msg350823 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-08-29 19:53
If normalize() is implemented for Windows, then the tests should be split out into POSIX and Windows versions. Currently, most of the tests in NormalizeTest are not checking a result that's properly normalized for ucrt.

A useful implementation of locale.normalize should allow a script to use ("en_US", "iso8859_1") in Windows without having to know that Latin-1 is Windows codepage 28591, or that ucrt requires a classic locale name if the encoding isn't UTF-8. The required result for setlocale() is "English_United States.28591". 

As far as aliases are concerned, at a minimum, we need to map "posix" and "c" to "C". We can also support "C.UTF-8" as "en_US.UTF-8". Do we need to support the Unix locale_alias mappings from X.org? If so, I suppose we could use a double mapping. First try the Unix locale_alias mapping. Then try that result in a windows_locale_alias mapping that includes additional mappings from Unix to Windows. For example: 

    sr_CS.UTF-8          -> sr_Cyrl_CS.UTF-8
    sr_CS.UTF-8@latin    -> sr_Latn_CS.UTF-8
    ca_ES.UTF-8@valencia -> ca_ES_valencia.UTF-8

Note that the last one doesn't currently work. "ca-ES-valencia" is a valid Windows locale name for the Valencian variant of Catalan (ca), which lacks an ISO 639 code of its own since it's officially (and somewhat controversially) designated as a dialect of Catalan. This is an unusual case that has a subtag after the region, which ucrt's manual BCP-47 parsing cannot handle. (It tries to parse "ES" as the script and "valencia" as an ISO 3166-1 country code.)

After mapping aliases, if the result still has "@" in it, normalize() should fail. We don't know what the "@" modifier means.

Otherwise, split the locale name and encoding parts. If the encoding isn't UTF-8, try to map it to a codepage. For this we need a  windows_codepage_alias dict that maps IANA official and Python-specific encoding names to Windows codepages. Next, check the locale name via WINAPI IsValidLocaleName. If it's not valid, try replacing underscore with hyphen and check again. Otherwise assume it's a classic ucrt locale name. (It may not be valid, but implementing all of the work ucrt does to parse a classic locale name is too much I think.) If it's a valid Windows locale name, and we have a codepage encoding, then try to translate it as a classic ucrt locale name. This requires two WINAPI GetLocaleInfoEx calls to look up the English versions of the language and country name.
msg351788 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-09-11 09:51
FYI I just closed issue10466 as a duplicate (even though that one's been around longer, this issue has more relevant information on it).
msg361804 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-11 12:15
I marked bpo-38324 as a duplicate of this issue.
msg361806 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-11 12:24
For me, the locale.getlocale() function is broken. Python attempts to guess too many things about the current locale. For example, it relies on an hard coded locale.locale_encoding_alias dictionary which seems to come from X11!? In 2020, Wayland replaced X11 and locales are updated frequently. This dictionary makes no sense on Windows. For example, 'ka_ge' is mapped to 'ka_GE.GEORGIAN-ACADEMY': "GEORGIAN-ACADEMY" is not an encoding, what is the purpose of this string?

I fixed dozens and dozens of locale issues and I never ever used locale.getlocale(). I never understood its purpose nor how it guess the encoding.

I always use locale.setlocale(category) or locale.setlocale(category, None) which returns a simply string which I can pass back to locale.setlocale(category, string) to restore the locale, when I need to temporarily change a locale.
msg379068 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-10-20 00:15
test__locale (#38324) also passed CI and buildbots while failing locally.
msg382739 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-12-08 14:06
This seemingly useless test is the only test failure for me with installed 3.9.1.  Why keep it, at least on Windows.

The failure with "locale.Error: unsupported locale setting" is not limited to Windows.  #25191 and duplicate #31636 report the same error on user machines (non-buildbot, as here). #25191proposes the following patch to skip this failure.

-        locale.setlocale(locale.LC_CTYPE, loc)
-        self.assertEqual(loc, locale.getlocale(locale.LC_CTYPE))
+        try:
+            locale.setlocale(locale.LC_CTYPE, loc)
+            self.assertEqual(loc, locale.getlocale(locale.LC_CTYPE))
+        except locale.Error:
+            # Unsupported locale setting
+            self.skipTest('unsupported locale setting')

I believe that this is effectively the same as deleting the test.  But if we believe it is being skipped on at least Windows buildbots, then we should do the same at least on user Windows machines.  Or, if we could detect user manchines, skip on them.
msg389005 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-18 09:28
"ERROR: test_getsetlocale_issue1813 (test.test_locale.TestMiscellaneous)" fails on the Windows x64 job of GitHub Actions when Python is built in debug mode:
https://github.com/python/cpython/pull/24914
msg389657 - (view) Author: David Bolen (db3l) * Date: 2021-03-29 04:36
The test has also begun failing on the Win10 buildbot (after updating to 20H2 from an older 1803).
msg389661 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-03-29 06:33
So, can we delete it?

PR 19781 is for #43510 and is listed here above only because this issue is mentioned.
msg389784 - (view) Author: David Bolen (db3l) * Date: 2021-03-30 03:04
I don't have much of a horse in the race, but since the test has historically been skipped on Windows, and the test hasn't and doesn't work on Windows, modifications to restore the skip behavior seem reasonable to me.  The trigger for this issue was Windows adding support for underscore in locale names (like tr_TR) so the test began executing.  But it's not a regression or new issue, it's just existing reality becoming exposed.

The user machine and buildbot discrepancy can be attributed to version differences, as the buildbot hadn't yet received the same underscore locale name support.

I'd be fine with removing the test entirely - always skipping on a failure just seems pointless.  Then again, issue1813 created the test for a purpose on other systems, though even back then it appears it was complicated.  Leaving the test but skipping known failing systems (I guess at least Windows and OpenBSD) might be slightly less intrusive of a change, assuming the test is still serving a purpose elsewhere.

Separately, there's a lot of useful/interesting detail here which could inform any eventual normalization changes on Windows, should the underlying issue be deemed worthy of addressing.  But that seems like something that could be a distinct operation from clearing up the test issue.
msg389839 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-03-30 17:09
This is now holding up some security releases (due to a couple of CVEs). Can we get the test skipped or fixed asap, please?
msg389867 - (view) Author: David Bolen (db3l) * Date: 2021-03-31 03:09
In lieu of the patch in #25191, what about a pair of skips to deal with the issues at hand without killing the test entirely?  I'm including OpenBSD since those issues were closed in favor of this one, and am assuming that skipping there is also appropriate.

--- a/Lib/test/test_locale.py
+++ b/Lib/test/test_locale.py
@@ -552,6 +552,10 @@ def test_setlocale_category(self):
         # crasher from bug #7419
         self.assertRaises(locale.Error, locale.setlocale, 12345)
 
+    @unittest.skipIf(sys.platform == 'win32',
+                     "Test broken on Windows (issue #37945)")
+    @unittest.skipIf(sys.platform.startswith('openbsd'),
+                     "Test broken on OpenBSD (issues #31636 and #25191)")
     def test_getsetlocale_issue1813(self):
         # Issue #1813: setting and getting the locale under a Turkish locale
         oldlocale = locale.setlocale(locale.LC_CTYPE)
msg389883 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-31 09:52
On Windows 10 build 1903,

vstinner@WIN C:\vstinner\python\master>python -m test test_locale -m test_getsetlocale_issue1813 -v
== CPython 3.10.0a6+ (heads/master:ff3c9739bd, Mar 31 2021, 12:43:26) [MSC v.1916 64 bit (AMD64)]
== Windows-10-10.0.18362-SP0 little-endian
(...)
======================================================================
ERROR: test_getsetlocale_issue1813 (test.test_locale.TestMiscellaneous)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\vstinner\python\master\lib\test\test_locale.py", line 567, in test_getsetlocale_issue1813
    locale.setlocale(locale.LC_CTYPE, loc)
  File "C:\vstinner\python\master\lib\locale.py", line 610, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting


It's a bug in the weird locale.getlocale() function which produces a locale name which doesn't exist:

vstinner@WIN C:\vstinner\python\master>python
>>> import locale
>>> locale.setlocale(locale.LC_CTYPE, "tr_TR") 
'tr_TR'
>>> locale.setlocale(locale.LC_CTYPE, None)
'tr_TR'
>>> locale.getlocale(locale.LC_CTYPE)           
('tr_TR', 'ISO8859-9')


>>> locale.setlocale(locale.LC_CTYPE, ('tr_TR', 'ISO8859-9'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\vstinner\python\master\lib\locale.py", line 610, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

>>> locale.setlocale(locale.LC_CTYPE, 'tr_TR')
'tr_TR'


If you use setlocale(LC_CTYPE, None) to get the locale, it works as expected.

IMO the getlocale() function is dangerous and should be removed: see bpo-43557 "Deprecate getdefaultlocale(), getlocale() and normalize() functions".
msg389884 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-31 09:56
I wrote PR 25110 to simply skip the test if setlocale() fails. It fix the issue on Windows (I tested manually, see my comment on my PR), but it should also fix the issue on OpenBSD and any platform where getlocale() returns a locale not accepted by setlocale().

Again, don't ust getlocale(category) but setlocale(category, None).
msg389885 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-03-31 10:01
Yeah, I'm making the change David suggested. It applies to 3.8 as well.
msg389886 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-03-31 10:02
Oh, Victor's solution is fine as well.
msg389888 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-03-31 11:01
New changeset f3ab670fea75ebe177e3412a5ebe39263cd428e3 by Victor Stinner in branch 'master':
bpo-37945: Fix test_locale.test_getsetlocale_issue1813() (#25110)
https://github.com/python/cpython/commit/f3ab670fea75ebe177e3412a5ebe39263cd428e3
msg389890 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-03-31 11:52
New changeset fabdd25fe505c08da064425ea4d099fd2cef39d3 by Miss Islington (bot) in branch '3.9':
bpo-37945: Fix test_locale.test_getsetlocale_issue1813() (GH-25110) (GH-25112)
https://github.com/python/cpython/commit/fabdd25fe505c08da064425ea4d099fd2cef39d3
msg389891 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-03-31 11:52
New changeset e143eea4b56ac7ae611e5bcc41eedbc572aa41c3 by Miss Islington (bot) in branch '3.8':
bpo-37945: Fix test_locale.test_getsetlocale_issue1813() (GH-25110) (GH-25113)
https://github.com/python/cpython/commit/e143eea4b56ac7ae611e5bcc41eedbc572aa41c3
msg389892 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-31 12:08
Ok, the initial issue has been fixed: test_locale pass again on Windows.

Let's continue the discussion on getlocale() in bpo-43557 "Deprecate getdefaultlocale(), getlocale() and normalize() functions" ;-)
msg389938 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-03-31 22:40
Great!  For the first time in over 2 years, the test suite passes on a Windows repository build on my machine.  I will test installed 3.10 after the next alpha release.  (3.10.0a7 has other failures as well.)
msg389940 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-31 22:47
> Great!  For the first time in over 2 years, the test suite passes on a Windows repository build on my machine. 

Nice :-)
History
Date User Action Args
2021-03-31 22:47:06vstinnersetmessages: + msg389940
2021-03-31 22:40:08terry.reedysetmessages: + msg389938
2021-03-31 12:08:34vstinnersetstatus: open -> closed
priority: release blocker ->
messages: + msg389892

resolution: fixed
stage: patch review -> resolved
2021-03-31 11:52:34lukasz.langasetmessages: + msg389891
2021-03-31 11:52:22lukasz.langasetmessages: + msg389890
2021-03-31 11:02:14miss-islingtonsetpull_requests: + pull_request23857
2021-03-31 11:02:03miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request23856
2021-03-31 11:01:54lukasz.langasetmessages: + msg389888
2021-03-31 10:02:49lukasz.langasetmessages: + msg389886
2021-03-31 10:01:33lukasz.langasetmessages: + msg389885
versions: + Python 3.8
2021-03-31 09:56:57vstinnersetmessages: + msg389884
2021-03-31 09:55:49vstinnersetpull_requests: - pull_request23809
2021-03-31 09:54:04vstinnersetpull_requests: + pull_request23854
2021-03-31 09:52:47vstinnersetmessages: + msg389883
2021-03-31 03:09:40db3lsetmessages: + msg389867
2021-03-30 17:09:19steve.dowersetpriority: normal -> release blocker
nosy: + ned.deily, lukasz.langa
messages: + msg389839

2021-03-30 03:13:14drichardsonsetnosy: - drichardson
2021-03-30 03:04:22db3lsetmessages: + msg389784
2021-03-29 06:33:38terry.reedysetmessages: + msg389661
2021-03-29 05:46:57methanesetkeywords: + patch
nosy: + methane

pull_requests: + pull_request23809
stage: needs patch -> patch review
2021-03-29 04:40:06jklothsetnosy: + jkloth
2021-03-29 04:36:38db3lsetnosy: + db3l
messages: + msg389657
2021-03-18 09:28:50vstinnersetnosy: + vstinner
messages: + msg389005
2020-12-14 09:15:12vstinnersetnosy: - vstinner

title: test_locale.TestMiscellaneous.test_getsetlocale_issue1813() fails -> [Windows] test_locale.TestMiscellaneous.test_getsetlocale_issue1813() fails
2020-12-13 19:11:05drichardsonsetnosy: + drichardson
2020-12-08 14:06:02terry.reedysetnosy: + serhiy.storchaka

messages: + msg382739
title: [Windows] locale.getdefaultlocale() issues on Windows: test_locale.test_getsetlocale_issue1813() -> test_locale.TestMiscellaneous.test_getsetlocale_issue1813() fails
2020-12-08 14:01:21terry.reedylinkissue25191 superseder
2020-12-08 13:36:45terry.reedylinkissue40652 superseder
2020-10-20 00:15:16terry.reedysetversions: + Python 3.10
nosy: + terry.reedy

messages: + msg379068

stage: needs patch
2020-02-11 19:36:49eryksununlinkissue38324 superseder
2020-02-11 12:24:25vstinnersetmessages: + msg361806
2020-02-11 12:15:22vstinnersetnosy: + vstinner
messages: + msg361804
2020-02-11 12:14:59vstinnerlinkissue38324 superseder
2019-09-30 18:34:32vstinnersettitle: test_locale failing -> [Windows] locale.getdefaultlocale() issues on Windows: test_locale.test_getsetlocale_issue1813()
2019-09-11 20:55:53guy.lintonsetnosy: + guy.linton
2019-09-11 09:51:28steve.dowersetmessages: + msg351788
2019-09-11 09:49:22steve.dowerlinkissue10466 superseder
2019-08-29 19:53:50eryksunsetmessages: + msg350823
2019-08-29 19:47:44eryksunsetmessages: + msg350820
2019-08-27 04:59:23tim.goldensetmessages: + msg350598
2019-08-26 21:51:59steve.dowersetmessages: + msg350574
2019-08-26 21:46:48steve.dowersetmessages: + msg350573
2019-08-26 21:43:13eryksunsetmessages: + msg350571
2019-08-26 21:23:53steve.dowersetmessages: + msg350569
2019-08-26 20:51:32eryksunsetmessages: + msg350568
2019-08-26 18:26:48steve.dowersetmessages: + msg350559
2019-08-26 17:19:31tim.goldensetmessages: + msg350549
2019-08-26 16:33:11steve.dowersetmessages: + msg350548
2019-08-26 08:16:44eryksunsetmessages: + msg350510
2019-08-26 06:52:36tim.goldensetmessages: + msg350491
2019-08-26 05:11:17eryksunsetmessages: + msg350485
2019-08-25 19:34:05tim.goldensetmessages: + msg350471
2019-08-25 19:29:13tim.goldensetmessages: + msg350470
2019-08-25 17:45:32xtreaksetnosy: + eryksun, paul.moore, xtreak, zach.ware, steve.dower
components: + Windows
2019-08-25 17:33:21tim.goldencreate