This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: locale.getdefaultlocale() fails on Mac OS X with default language set to English
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Dmitry.Jemerov, Ilya.Kulakov, Tiger-222, alexander.sturm, barry-scott, karolyi, larryv, lemburg, loewis, mattheww, miss-islington, ncoghlan, ned.deily, r.david.murray, rfmoz, ronaldoussoren, serhiy.storchaka, tsparber, wolma
Priority: normal Keywords: needs review, patch

Created on 2013-07-06 12:19 by Dmitry.Jemerov, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
getdefaultlocale.patch Dmitry.Jemerov, 2013-07-06 12:19 Patch with tests review
issue-18378-py27.txt ronaldoussoren, 2015-05-15 09:37 review
issue-18378-py35.txt ronaldoussoren, 2015-05-15 09:37 review
issue18378-2015-07-25-py36.txt ronaldoussoren, 2015-07-25 10:45 review
Pull Requests
URL Status Linked Edit
PR 14736 merged ronaldoussoren, 2019-07-13 11:46
PR 15569 merged miss-islington, 2019-08-29 04:34
PR 15570 merged miss-islington, 2019-08-29 04:34
Messages (45)
msg192422 - (view) Author: Dmitry Jemerov (Dmitry.Jemerov) Date: 2013-07-06 12:19
On Mac OS X 10.8 with the default language set to English (System Preferences | Language and Text), the default terminal application sets the LC_CTYPE environment variable to "UTF-8". If you run Python from the terminal and try to use locale.getdefaultlocate(), you get the following error:

> python
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getdefaultlocale()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 496, in getdefaultlocale
    return _parse_localename(localename)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 428, in _parse_localename
    raise ValueError, 'unknown locale: %s' % localename
ValueError: unknown locale: UTF-8

(The stacktrace is from Python 2.7 but Python 3.3 suffers from the same problem.)

There are numerous workarounds for this problem (turning off the "Set locale environment variables on startup" option in the terminal settings, or adding "export LC_CTYPE=en_US.UTF8" to .bash_profile, selecting a language other than English in the Language & Text settings), but these require additional configuration from the user's side. 

I think that the more useful behavior is for Python to handle this behavior of the system and not crash, even though it doesn't strictly comply to the POSIX standard.

The attached patch (against current Python 3.4 master branch) is one possible fix.
msg192429 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2013-07-06 12:33
Strange, I have LANG=en_US.UTF-8 in my environment and no LC_CTYPE. A clean test account does have the same behavior as you are seeing.
msg192433 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2013-07-06 12:49
The UTF-8 value seems suspect to me, but is actually supported by the system, changing it to a nonsense value results in failure in the C function setlocale.

As for the patch: I'd add this workaround only to the OSX platform (that is, test for sys.platform == 'darwin' before checking for UTF-8 as a value).
msg192445 - (view) Author: Dmitry Jemerov (Dmitry.Jemerov) Date: 2013-07-06 14:24
Judging from the results of Googling for the error message, I'm far from the only one seeing this problem.

What exactly would be the benefit of adding the code to check for the platform?
msg192446 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2013-07-06 14:28
The test for darwin is needed because other platforms don't support "UTF-8" as a valid LC_CTYPE name, on a recent linux box:


>>> locale.setlocale(locale.LC_CTYPE, "UTF-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/python2.7/lib/python2.7/locale.py", line 539, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

(And just calling setlocale to check if the value is valid is not an option because that changes process-global state)
msg192447 - (view) Author: Dmitry Jemerov (Dmitry.Jemerov) Date: 2013-07-06 14:34
Why exactly does this matter? UTF-8 not being a valid LC_CTYPE value simply means that no one running Linux will ever have LC_CTYPE set to UTF-8, and the branch will never be hit. 

OTOH, adding the check will make the code harder to test and simply larger (no code is always better than any non-zero amount of code).
msg192460 - (view) Author: Dmitry Jemerov (Dmitry.Jemerov) Date: 2013-07-06 16:23
A related issue (with a patch that touches the same locale parsing code) is http://bugs.python.org/issue5815
msg192820 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-07-10 15:45
Why do you need the "getdefaultlocale" function in the first place? I'd advise against using it, precisely because it can trigger problems like this one.
msg192821 - (view) Author: Dmitry Jemerov (Dmitry.Jemerov) Date: 2013-07-10 15:47
I personally don't, but the function is used by Sphinx, which is what I was trying to get to work when I ran into this problem.
msg192822 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-07-10 15:56
Regardless of the resolution here, the use of getdefaultlocale could be reported as a bug on the sphinx tacker.
msg192827 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-07-10 16:15
FWIW, I couldn't find any use of getdefaultlocale in any of the hg revisions (using hg grep) in

https://bitbucket.org/birkenfeld/sphinx/

Instead, it's (probably) docutils, which has this code:

    locale_encoding = locale.getlocale()[1] or locale.getdefaultlocale()[1]
    # locale.getpreferredencoding([do_setlocale=True|False])
    # has side-effects | might return a wrong guess.
    # (cf. Update 1 in http://stackoverflow.com/questions/4082645/using-python-2-xs-locale-module-to-format-numbers-and-currency)

I find that quite unfortunate, since locale.getpreferredencoding() would have don the right thing (IMO).
msg209731 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2014-01-30 18:06
I just ran into this problem myself. 

On fresh installs of OSX 10.9 LC_CTYPE is set to "UTF-8" (at least for english language users), and now sphinx won't work :-(

Is Dimitrys patch acceptable (either as is, or with my suggestion of checking for sys.platform == "darwin")?
msg214397 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-03-21 18:31
Ronald or Dmitry, can you elaborate under what conditions you start your login shell on 10.9?  I cannot reproduce the behavior you observe.  With 10.9 Terminal.app and the default language settings in System Preferences and with the default Terminal.app preferences, specifically Settings -> (Profile) -> Advanced -> Character encoding -> Unicode (UTF-8) and "Set LANG environment variable on startup" checked, login sessions have LANG=en_US.UTF-8 defined and LC_CTYPE is not defined at all. Are you sure that isn't begin created by a shell profile somewhere?  (I can't check earlier OS X releases at the moment.)  That said, I agree that, if OS X accepts "UTF-8" as a valid locale, the locale module should, too.
msg214555 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2014-03-23 09:00
I didn't get this on my previous system (which was basically a 10.4 system updated through 10.5, 10.7, ..., to 10.9), but did get it on my current system, which has a fresh 10.9 install where I did not use the migration assistent to migrate settings.

Thus for me to get the behavior with LC_CTYPE:

* New system with OSX 10.9 pre-installed
* Select "English" as the primary language
* Start Terminal.app and inspect the environment

I have not tried to reproduce this in a VM. 

BTW. I have the same system settings a you.
msg214556 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2014-03-23 09:04
With the following C code:

#include <locale.h>
#include <stdio.h>

int main(void)
{
	char* res = setlocale(LC_CTYPE, "UTF-8");
	printf("Result: %s\n", res);

	res = setlocale(LC_CTYPE, "UTF-9");
	printf("Result: %s\n", res);
	return 0;
}
/* EOF */

I get the following output:

Result: UTF-8
Result: (null)

That is, UTF-8 is a valid locale for LC_CTYPE, and as expected some other string isn't.

BTW. "UTF-8" is only a valid locale for LC_CTYPE, not for other categories (when you change LC_CTYPE to LC_ALL both calls return NULL).
msg214564 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-23 12:04
That is seriously broken on Apple's part.  But I guess we have no choice but to emulate their bug.
msg215215 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2014-03-31 00:20
I've looked at this a bit, primarily on OS X 10.9 Mavericks, although I expect mostly similar behavior on older recent releases of OS X.  On 10.9, the setting of locale variables is done by whatever program is used to launch a shell.  I looked at the behavior of the built-in Terminal.app, the third-party iTerm2.app, the MacPorts distribution of xterm, and the built-in sshd.  By default, the latter two do not set any locale env variables.  Both Terminal.app and iTerm2.app set either LANG or LC_CTYPE based on the user's settings for "Region" and "Preferred Language" in the "System Preferences" -> "Language & Region" control panel.  Three examples:

1. "Region" = "United States", "Preferred Language" = "English":
    -> LANG=en_US.UTF-8

2. "Region" = "Germany", "Preferred Language" = "German"
    -> LANG=de_DE.UTF-8

3. "Region" = "Germany", "Preferred Language" = "English"
    -> LC_CTYPE= "UTF-8"

So it is almost certainly the last case that is under discussion here.  Whether or not that is a bug is not as clear as it might seem at first.  BSD implementations of locale differ from the GNU Linux version.  Both FreeBSD and OS X define a "UTF-8" locale that has only one locale category defined in it: LC_CTYPE.  It appears to be a fallback locale used when there is no applicable region / language combination, in this case no "en_DE*" locales.

$ ls /usr/share/locale/UTF*
LC_CTYPE

Compare with the en_US* locales:

$ ls /usr/share/locale/en_US*
/usr/share/locale/en_US:
LC_COLLATE  LC_CTYPE    LC_MESSAGES LC_MONETARY LC_NUMERIC  LC_TIME

/usr/share/locale/en_US.ISO8859-1:
LC_COLLATE  LC_CTYPE    LC_MESSAGES LC_MONETARY LC_NUMERIC  LC_TIME

/usr/share/locale/en_US.ISO8859-15:
LC_COLLATE  LC_CTYPE    LC_MESSAGES LC_MONETARY LC_NUMERIC  LC_TIME

/usr/share/locale/en_US.US-ASCII:
LC_COLLATE  LC_CTYPE    LC_MESSAGES LC_MONETARY LC_NUMERIC  LC_TIME

/usr/share/locale/en_US.UTF-8:
LC_COLLATE  LC_CTYPE    LC_MESSAGES LC_MONETARY LC_NUMERIC  LC_TIME

Now as I read the current POSIX standard, there is nothing wrong with this.  AFAICT, the standard places no restriction on the format of locale names, in particular, it does not mandate that they conform to RFC 1766 or its successors.  Further, the standard provides for implementation-specific locales (other than the mandatory "POSIX" aka "C" locale) and some platforms provide tools to create custom locales, e.g. mklocale(1) on FreeBSD and OS X, localedef(1) on GNU Linux.  So I wonder if the locale module should really be imposing its own restrictions on locale names as it does currently.

From IEEE Std 1003.1, 2013 Edition:
"The capability to specify additional locales to those provided by an implementation is optional, denoted by the _POSIX2_LOCALEDEF symbol. If the option is not supported, only implementation-supplied locales are available. Such locales shall be documented using the format specified in this section. [...] The locale definition file shall contain one or more locale category source definitions, and shall not contain more than one definition for the same locale category. [...]  In the event that some of the information for a locale category, as specified in this volume of POSIX.1-2008, is missing from the locale source definition, the behavior of that category, if it is referenced, is unspecified."

There is a further complication for OS X.  Apple provides a richer native API for locales, CFLocale (and its Cocoa equivalent, NSLocale).  So some nuances may get lost in the imperfect mapping between CFLocale and the conventional LC_* environment variables and between them and Python.  We could look at trying to support the native APIs as well.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07
https://developer.apple.com/library/mac/documentation/CoreFoundation/Conceptual/CFLocales/CFLocales.html
https://developer.apple.com/library/mac/documentation/CoreFoundation/Reference/CFLocaleRef/Reference/reference.html
msg239485 - (view) Author: Barry Alan Scott (barry-scott) * Date: 2015-03-29 11:08
Mac OS X use the __CF_USER_TEXT_ENCODING env var to setup the locale in for native libraries.

I found that for GUI python code I needed to convert the value in __CF_USER_TEXT_ENCODING into a suitable call to setlocale().

The code I use is attached to Issue23797.
msg239702 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-03-31 13:28
1) I agree with Ned that the OSX behavior is not broken, it is different but within spec. Python makes assumption about the format of locale names that aren't universally valid.

2) We should be careful in using CFLocale. Those APIs are part of CoreFoundation and CoreFoundation APIs cannot be used in the child proces after calling os.fork. 

As an aside to 2), CoreFoundation and any other Apple "Cocoa" frameworks should be assumed to use threads and hence the comment about threads in the fork specification (link below) apply, and currently Apple doesn't appear to use pthread_atfork to make sure library state is valid in child processes after fork.

<http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html>
msg243262 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-05-15 09:44
Dimitry's patch looks good, I added my patch before checking if there already is patch.

The only thing that might be cause discussion is when to accept 'UTF-8' as a valid locale name.  My patch only accepts in on OSX, while Dimitry's patch accepts it everywwhere.

Writing this I'm slightly in favour of Dimitry's approach: I quite often run into problems when using SSH to log in to a Linux box from my OSX laptop (with LC_CTYPE=UTF-8). Almost everything works correctly, except for Python code that uses the locale module (which craps out with the exception in the first message in this issue).

IMHO Dimitry's patch should be applied as is.
msg247318 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-07-25 08:27
ping...

I think the current behavior is a bug in Python and should be fixed in 2.7, 3.4, 3.5 and default (using Dmitry's patch). 

I'd like to commit the patch, but would like someone else's review of the patch before doing so.
msg247322 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-25 09:22
Needed tests.

With the patch:

$ LC_CTYPE=UTF-8 ./python
>>> import locale
>>> locale.getdefaultlocale()
(None, 'UTF-8')
>>> locale.getpreferredencoding()
'ANSI_X3.4-1968'
>>> locale.getlocale()
(None, None)

$ LC_CTYPE=en_US_UTF-8 ./python
>>> import locale
>>> locale.getdefaultlocale()
('en_US', 'UTF-8')
>>> locale.getpreferredencoding()
'UTF-8'
>>> locale.getlocale()
('en_US', 'UTF-8')

I think getpreferredencoding() and getlocale() should return the UTF-8 encoding.
msg247326 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-25 09:46
Perhaps the better way to solve this issue is to use aliases table. What is the LC_CTYPE environment variable set when the default language set to non-English? How different native MacOS X command-line programs behave when set LC_CTYPE to other encoding (e.g. ASCII, US-ASCII, ISO8859-1, ISO-8859-1, Latin1)? What if set it to UTF8 (no minus) or utf-8 (lower case)?
msg247333 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-07-25 10:12
The only locale that doesn't include language information is the UTF-8 one, there is no locale named "US-ASCII".

See /usr/share/locale on an OSX system.

PS. The more I look at locale.py the more problems I find with it. The code makes a unwarranted assumptions about locales that aren't actually true on all systems.

For example:

>>> locale.normalize('ja_JP')
'ja_JP.eucJP'


That's not true on OSX, /usr/share/locale/ja_JP/LC_CTYPE is a symlink to /usr/share/locale/UTF-8/LC_CTYPE.

AFAIK *all* locale's on OSX use UTF-8.
msg247335 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-07-25 10:16
The alias mechanism cannot be used because LC_CTYPE=UTF-8 as the locale doesn't imply anything about languages. 

In Linux terms it is more or less equal to "C.UTF-8" or "POSIX.UTF-8", except that those two aren't valid locales on OSX.
msg247338 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-07-25 10:43
Testing this is interesting to say the least due to the dynamic way the module interface is built.

Serhiy: are you testing on a Linux machine? On my machine getpreferredencoding() returns 'UTF-8' because it hits the CODESET path (which ends up calling ``_locale.nl_langinfo(_locale.CODESET)`` and that returns UTF-8).
msg247339 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-07-25 10:45
I've attached a patch with more tests, but I'm not to happy about the new test because it too much of a white box test and is therefore fairly fragile w.r.t. the actual implementation of the module.
msg247418 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-26 09:00
Yes, I were testing on a Linux machine and forgot that results are OS depending.

I agree, that test should less depend on implementation details. As far as _locale._getdefaultlocale is defined only on Windows and "UTF-8" is not valid locale on Windows, I think there is no need to patch _locale for testing. But getlocale() and getpreferredencoding() should be consistent with getdefaultlocale() (and getlocale() is yet one way to test private function _parse_localename()). setlocale() should work with the result of getlocale() and getdefaultlocale(). Are following tests passed on OSX?
msg263659 - (view) Author: Wolfgang Maier (wolma) * Date: 2016-04-18 08:56
ping?

Just ran into this issue on OS X El Capitan with Region set to Germany and Language to English. Just as Ned pointed out 2 years ago, this results in LC_CTYPE set to 'UTF-8' in the terminal and docutils still can't cope with it.
msg268579 - (view) Author: Ilya Kulakov (Ilya.Kulakov) * Date: 2016-06-14 18:39
Could someone provide a patch for Python 3.5?
msg278540 - (view) Author: László Károlyi (karolyi) Date: 2016-10-12 19:41
OSX Sierra + Python, the bug still exists.

subscribing
msg285317 - (view) Author: Wolfgang Maier (wolma) * Date: 2017-01-12 12:28
To me this issue seems quite related to PEP 538. Maybe the LC_CTYPE coercion proposed in the PEP could be extended to cover the case of LC_CTYPE=UTF-8?
msg285318 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-01-12 12:59
PEP 538 wouldn't help here, as there's nothing wrong with CPython's assumptions about the text encoding to use for operating system interfaces - it's assuming UTF-8 (because it's Mac OS X) and that assumption is correct (because it's Mac OS X).

The problem appears to be that locale.py was written primarily for Linux, and hence makes assumptions that aren't valid on BSD and Mac OS X.

Dmitry's suggested solution of taking the BSD/Mac OS X specific locale of "UTF-8" and universally accepting it as meaning (None, "UTF-8") sems like a sensible step forward, even if it doesn't resolve all the discrepancies.

Where PEP 538 and PEP 540 would come into play is when this setting gets forwarded over SSH to Linux servers (as then CPython *will* get the nominal system text encoding wrong), but that's independent of getting the locale module to handle it more gracefully.
msg285319 - (view) Author: Wolfgang Maier (wolma) * Date: 2017-01-12 13:12
I think PEP 538 extended to the UTF-8 locale *would* help here. Specifically, it would coerce only LC_CTYPE to en_US.UTF-8 (unless OS X has C.UTF-8), which I guess is good enough for the purpose here.

I do agree that it is not the kind of problem that PEP 538 tries to solve right now, but it could be extended to cover other types of problematic locales like this one. Just wanted to make you aware of this possibility.
msg285329 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-01-12 14:44
I think Ronald's patch issue18378-2015-07-25-py36.txt with added darwin check would be the best way forward.

In the current form, it would allow using 'UTF-8' as locale string on all platforms - which is not such a good idea.
msg285360 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-01-13 03:47
SSH environment forwarding will propagate this "LC_CTYPE=UTF-8" setting from Mac OS X clients to Linux servers.

At present, that breaks in multiple ways, as CPython will interpret it as being the "C" locale (since Linux servers don't offer a "UTF-8" locale, even when they do offer "C.UTF-8")

PEPs 538 and 540 aim to help CPython itself to deal with that case, but that won't be sufficient to help code that tries to pass the nominal LC_CTYPE setting to the locale module.

Accepting "UTF-8" and interpreting it as functionally equivalent to C.UTF-8 will mean that this setting will at least work as desired on servers that offer C.UTF-8.
msg285370 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2017-01-13 08:48
On 13.01.2017 04:47, Nick Coghlan wrote:
> Accepting "UTF-8" and interpreting it as functionally equivalent to C.UTF-8 will mean that this setting will at least work as desired on servers that offer C.UTF-8.

I don't think that's within the scope of this patch. "UTF-8" is not
a valid locale setting on Linux and so Python should not allow
passing this through the locale normalization process on Linux.

Please also note that SSH does not forward arbitrary env vars.
Only a select few are forwarded and all others have to be
configured. The locale vars are not among the default ones
(see the ssh man page for details).

Aisde: While looking into this I found that the locale module
aliases C.UTF-8 to en_US.UTF-8. This was added as part of
issue #20076 and originates from the X.org locale.alias file.
Time machine and all that :-)
msg296829 - (view) Author: Matthew Woodcraft (mattheww) Date: 2017-06-25 17:01
That alias (C.UTF-8 to en_US.UTF-8) is surely a bug in itself nowadays. I've filed #30755 .
msg333371 - (view) Author: Ricardo Fraile (rfmoz) Date: 2019-01-10 11:09
I still have this issue on MacOS Mojave 10.14

Python 3.7.2 (default, Dec 27 2018, 07:35:06)
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getdefaultlocale()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/locale.py", line 568, in getdefaultlocale
    return _parse_localename(localename)
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/locale.py", line 495, in _parse_localename
    raise ValueError('unknown locale: %s' % localename)
ValueError: unknown locale: UTF-8
>>>

$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
msg347803 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2019-07-13 10:55
LC_CTYPE=UTF-8 is a valid configuration on macOS, and is in the default environment when you install a fresh system. This includes the beta's for macOS 10.15 and is therefore unlikely to change anytime soon.

Interestingly enough I get this error even when I unset the relevant environment variables. For some reason LC_CTYPE is reset when I start the interpreter, even if it is set to something else. This means the usual way of working around this problem no longer works.

I'll create a pull request with an up-to-date version of my latest patch for further discussion.

BTW. I'm testing with the current tip of the tree, but 3.7.3 fails in the same way.
msg347805 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2019-07-13 11:52
As promised there is now a pull request. 

I'd love a review (and a change to approve the pull request when reviewers are happy, I'm trying to get back into actively contributing).

---

I now understand why locale.getdefaultlocale() fails even when LC_CTYPE is not set: pylifecycle sets LC_CTYPE to UTF-8 in the UTF-8 coercion code.
msg350706 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019-08-29 04:33
New changeset b0caf329815120acf50287e29858093d328b0e3c by Ned Deily (Ronald Oussoren) in branch 'master':
bpo-18378: Recognize "UTF-8" as a valid name in locale._parse_localename (GH-14736)
https://github.com/python/cpython/commit/b0caf329815120acf50287e29858093d328b0e3c
msg350709 - (view) Author: miss-islington (miss-islington) Date: 2019-08-29 04:52
New changeset 554143ebc2546e0b8b722dfafe397c0316f29980 by Miss Islington (bot) in branch '3.7':
bpo-18378: Recognize "UTF-8" as a valid name in locale._parse_localename (GH-14736)
https://github.com/python/cpython/commit/554143ebc2546e0b8b722dfafe397c0316f29980
msg350710 - (view) Author: miss-islington (miss-islington) Date: 2019-08-29 04:56
New changeset e471a543a4f7c52a8d0081ec5142adab3416d8fb by Miss Islington (bot) in branch '3.8':
bpo-18378: Recognize "UTF-8" as a valid name in locale._parse_localename (GH-14736)
https://github.com/python/cpython/commit/e471a543a4f7c52a8d0081ec5142adab3416d8fb
msg350731 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2019-08-29 06:30
Ronald's PR 14738 LGTM.  I merged it to master and backported for 3.8.0b4 and 3.7.5.  Thanks, everyone!
History
Date User Action Args
2022-04-11 14:57:47adminsetgithub: 62578
2019-08-29 06:30:06ned.deilysetstatus: open -> closed
versions: + Python 3.8, Python 3.9, - Python 2.7, Python 3.4, Python 3.5, Python 3.6
messages: + msg350731

resolution: fixed
stage: patch review -> resolved
2019-08-29 04:56:03miss-islingtonsetmessages: + msg350710
2019-08-29 04:52:45miss-islingtonsetnosy: + miss-islington
messages: + msg350709
2019-08-29 04:34:15miss-islingtonsetpull_requests: + pull_request15246
2019-08-29 04:34:09miss-islingtonsetpull_requests: + pull_request15245
2019-08-29 04:33:55ned.deilysetmessages: + msg350706
2019-07-23 10:29:34Tiger-222setnosy: + Tiger-222
2019-07-13 11:52:01ronaldoussorensetmessages: + msg347805
2019-07-13 11:46:22ronaldoussorensetpull_requests: + pull_request14530
2019-07-13 10:55:27ronaldoussorensetmessages: + msg347803
2019-01-10 11:09:59rfmozsetnosy: + rfmoz
messages: + msg333371
2017-06-25 17:01:55matthewwsetnosy: + mattheww
messages: + msg296829
2017-01-13 08:48:56lemburgsetmessages: + msg285370
2017-01-13 07:52:08vstinnersetnosy: - vstinner
2017-01-13 03:47:58ncoghlansetmessages: + msg285360
2017-01-12 14:44:47lemburgsetnosy: + lemburg
messages: + msg285329
2017-01-12 13:12:07wolmasetmessages: + msg285319
2017-01-12 12:59:26ncoghlansetmessages: + msg285318
2017-01-12 12:28:02wolmasetnosy: + ncoghlan

messages: + msg285317
versions: + Python 3.7
2016-10-12 19:41:49karolyisetnosy: + karolyi
messages: + msg278540
2016-09-29 18:17:59larryvsetnosy: + larryv
2016-08-23 11:17:12alexander.sturmsetnosy: + alexander.sturm
2016-06-14 18:39:29Ilya.Kulakovsetnosy: + Ilya.Kulakov
messages: + msg268579
2016-04-19 13:11:29tsparbersetnosy: + tsparber
2016-04-18 08:56:42wolmasetnosy: + wolma
messages: + msg263659
2015-07-26 09:00:45serhiy.storchakasetmessages: + msg247418
2015-07-25 10:45:18ronaldoussorensetfiles: + issue18378-2015-07-25-py36.txt

messages: + msg247339
2015-07-25 10:43:58ronaldoussorensetmessages: + msg247338
2015-07-25 10:16:19ronaldoussorensetmessages: + msg247335
2015-07-25 10:12:30ronaldoussorensetmessages: + msg247333
2015-07-25 09:46:48serhiy.storchakasetmessages: + msg247326
2015-07-25 09:22:42serhiy.storchakasetmessages: + msg247322
2015-07-25 09:14:21serhiy.storchakasetnosy: + serhiy.storchaka

versions: + Python 3.5, Python 3.6, - Python 3.3
2015-07-25 08:27:05ronaldoussorensetmessages: + msg247318
2015-05-15 09:44:25ronaldoussorensetmessages: + msg243262
2015-05-15 09:37:51ronaldoussorensetfiles: + issue-18378-py35.txt
2015-05-15 09:37:38ronaldoussorensetfiles: + issue-18378-py27.txt
2015-03-31 13:28:30ronaldoussorensetmessages: + msg239702
2015-03-29 11:08:38barry-scottsetnosy: + barry-scott
messages: + msg239485
2014-03-31 00:20:54ned.deilysetmessages: + msg215215
2014-03-23 12:04:34r.david.murraysetmessages: + msg214564
2014-03-23 09:04:14ronaldoussorensetmessages: + msg214556
2014-03-23 09:00:02ronaldoussorensetmessages: + msg214555
2014-03-21 18:33:44ned.deilylinkissue20999 superseder
2014-03-21 18:31:26ned.deilysetnosy: + ned.deily
messages: + msg214397
2014-01-30 18:06:52ronaldoussorensetmessages: + msg209731
2013-07-10 16:15:34loewissetmessages: + msg192827
2013-07-10 15:56:37r.david.murraysetnosy: + r.david.murray
messages: + msg192822
2013-07-10 15:47:54Dmitry.Jemerovsetmessages: + msg192821
2013-07-10 15:45:25loewissetnosy: + loewis
messages: + msg192820
2013-07-06 16:23:29Dmitry.Jemerovsetmessages: + msg192460
2013-07-06 14:34:15Dmitry.Jemerovsetmessages: + msg192447
2013-07-06 14:28:06ronaldoussorensetmessages: + msg192446
2013-07-06 14:24:35Dmitry.Jemerovsetmessages: + msg192445
2013-07-06 12:49:45ronaldoussorensetversions: + Python 3.3
messages: + msg192433

keywords: + needs review
type: behavior
stage: patch review
2013-07-06 12:37:19vstinnersetnosy: + vstinner
2013-07-06 12:33:41ronaldoussorensetnosy: + ronaldoussoren
messages: + msg192429
2013-07-06 12:19:03Dmitry.Jemerovcreate