Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locale.getdefaultlocale() fails on Mac OS X with default language set to English #62578

Closed
DmitryJemerov mannequin opened this issue Jul 6, 2013 · 45 comments
Closed
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@DmitryJemerov
Copy link
Mannequin

DmitryJemerov mannequin commented Jul 6, 2013

BPO 18378
Nosy @malemburg, @loewis, @barry-scott, @ronaldoussoren, @ncoghlan, @mattheww, @ned-deily, @bitdancer, @larryv, @serhiy-storchaka, @wm75, @Kentzo, @tsparber, @karolyi, @miss-islington, @BoboTiG, @rfrail3
PRs
  • bpo-18378: Recognize "UTF-8" as a valid name in locale._parse_localename #14736
  • [3.8] bpo-18378: Recognize "UTF-8" as a valid name in locale._parse_localename (GH-14736) #15569
  • [3.7] bpo-18378: Recognize "UTF-8" as a valid name in locale._parse_localename (GH-14736) #15570
  • Files
  • getdefaultlocale.patch: Patch with tests
  • issue-18378-py27.txt
  • issue-18378-py35.txt
  • issue18378-2015-07-25-py36.txt
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-08-29.06:30:06.592>
    created_at = <Date 2013-07-06.12:19:03.656>
    labels = ['3.7', '3.8', 'type-bug', 'library', '3.9']
    title = 'locale.getdefaultlocale() fails on Mac OS X with default language set to English'
    updated_at = <Date 2019-08-29.06:30:06.586>
    user = 'https://bugs.python.org/DmitryJemerov'

    bugs.python.org fields:

    activity = <Date 2019-08-29.06:30:06.586>
    actor = 'ned.deily'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-08-29.06:30:06.592>
    closer = 'ned.deily'
    components = ['Library (Lib)']
    creation = <Date 2013-07-06.12:19:03.656>
    creator = 'Dmitry.Jemerov'
    dependencies = []
    files = ['30807', '39384', '39385', '40014']
    hgrepos = []
    issue_num = 18378
    keywords = ['patch', 'needs review']
    message_count = 45.0
    messages = ['192422', '192429', '192433', '192445', '192446', '192447', '192460', '192820', '192821', '192822', '192827', '209731', '214397', '214555', '214556', '214564', '215215', '239485', '239702', '243262', '247318', '247322', '247326', '247333', '247335', '247338', '247339', '247418', '263659', '268579', '278540', '285317', '285318', '285319', '285329', '285360', '285370', '296829', '333371', '347803', '347805', '350706', '350709', '350710', '350731']
    nosy_count = 19.0
    nosy_names = ['lemburg', 'loewis', 'barry-scott', 'ronaldoussoren', 'ncoghlan', 'mattheww', 'ned.deily', 'r.david.murray', 'Dmitry.Jemerov', 'larryv', 'serhiy.storchaka', 'wolma', 'Ilya.Kulakov', 'tsparber', 'karolyi', 'alexander.sturm', 'miss-islington', 'Tiger-222', 'rfrail3']
    pr_nums = ['14736', '15569', '15570']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue18378'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @DmitryJemerov
    Copy link
    Mannequin Author

    DmitryJemerov mannequin commented Jul 6, 2013

    On Mac OS X 10.8 with the default language set to English (System Preferences | Language and Text), the default terminal application sets the LC_CTYPE environment variable to "UTF-8". If you run Python from the terminal and try to use locale.getdefaultlocate(), you get the following error:

    > python
    Python 2.7.2 (default, Oct 11 2012, 20:14:37)
    [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import locale
    >>> locale.getdefaultlocale()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 496, in getdefaultlocale
        return _parse_localename(localename)
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/locale.py", line 428, in _parse_localename
        raise ValueError, 'unknown locale: %s' % localename
    ValueError: unknown locale: UTF-8

    (The stacktrace is from Python 2.7 but Python 3.3 suffers from the same problem.)

    There are numerous workarounds for this problem (turning off the "Set locale environment variables on startup" option in the terminal settings, or adding "export LC_CTYPE=en_US.UTF8" to .bash_profile, selecting a language other than English in the Language & Text settings), but these require additional configuration from the user's side.

    I think that the more useful behavior is for Python to handle this behavior of the system and not crash, even though it doesn't strictly comply to the POSIX standard.

    The attached patch (against current Python 3.4 master branch) is one possible fix.

    @DmitryJemerov DmitryJemerov mannequin added the stdlib Python modules in the Lib dir label Jul 6, 2013
    @ronaldoussoren
    Copy link
    Contributor

    Strange, I have LANG=en_US.UTF-8 in my environment and no LC_CTYPE. A clean test account does have the same behavior as you are seeing.

    @ronaldoussoren
    Copy link
    Contributor

    The UTF-8 value seems suspect to me, but is actually supported by the system, changing it to a nonsense value results in failure in the C function setlocale.

    As for the patch: I'd add this workaround only to the OSX platform (that is, test for sys.platform == 'darwin' before checking for UTF-8 as a value).

    @ronaldoussoren ronaldoussoren added the type-bug An unexpected behavior, bug, or error label Jul 6, 2013
    @DmitryJemerov
    Copy link
    Mannequin Author

    DmitryJemerov mannequin commented Jul 6, 2013

    Judging from the results of Googling for the error message, I'm far from the only one seeing this problem.

    What exactly would be the benefit of adding the code to check for the platform?

    @ronaldoussoren
    Copy link
    Contributor

    The test for darwin is needed because other platforms don't support "UTF-8" as a valid LC_CTYPE name, on a recent linux box:

    >>> locale.setlocale(locale.LC_CTYPE, "UTF-8")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/python2.7/lib/python2.7/locale.py", line 539, in setlocale
        return _setlocale(category, locale)
    locale.Error: unsupported locale setting

    (And just calling setlocale to check if the value is valid is not an option because that changes process-global state)

    @DmitryJemerov
    Copy link
    Mannequin Author

    DmitryJemerov mannequin commented Jul 6, 2013

    Why exactly does this matter? UTF-8 not being a valid LC_CTYPE value simply means that no one running Linux will ever have LC_CTYPE set to UTF-8, and the branch will never be hit.

    OTOH, adding the check will make the code harder to test and simply larger (no code is always better than any non-zero amount of code).

    @DmitryJemerov
    Copy link
    Mannequin Author

    DmitryJemerov mannequin commented Jul 6, 2013

    A related issue (with a patch that touches the same locale parsing code) is http://bugs.python.org/issue5815

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jul 10, 2013

    Why do you need the "getdefaultlocale" function in the first place? I'd advise against using it, precisely because it can trigger problems like this one.

    @DmitryJemerov
    Copy link
    Mannequin Author

    DmitryJemerov mannequin commented Jul 10, 2013

    I personally don't, but the function is used by Sphinx, which is what I was trying to get to work when I ran into this problem.

    @bitdancer
    Copy link
    Member

    Regardless of the resolution here, the use of getdefaultlocale could be reported as a bug on the sphinx tacker.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jul 10, 2013

    FWIW, I couldn't find any use of getdefaultlocale in any of the hg revisions (using hg grep) in

    https://bitbucket.org/birkenfeld/sphinx/

    Instead, it's (probably) docutils, which has this code:

        locale_encoding = locale.getlocale()[1] or locale.getdefaultlocale()[1]
        # locale.getpreferredencoding([do_setlocale=True|False])
        # has side-effects | might return a wrong guess.
        # (cf. Update 1 in http://stackoverflow.com/questions/4082645/using-python-2-xs-locale-module-to-format-numbers-and-currency)

    I find that quite unfortunate, since locale.getpreferredencoding() would have don the right thing (IMO).

    @ronaldoussoren
    Copy link
    Contributor

    I just ran into this problem myself.

    On fresh installs of OSX 10.9 LC_CTYPE is set to "UTF-8" (at least for english language users), and now sphinx won't work :-(

    Is Dimitrys patch acceptable (either as is, or with my suggestion of checking for sys.platform == "darwin")?

    @ned-deily
    Copy link
    Member

    Ronald or Dmitry, can you elaborate under what conditions you start your login shell on 10.9? I cannot reproduce the behavior you observe. With 10.9 Terminal.app and the default language settings in System Preferences and with the default Terminal.app preferences, specifically Settings -> (Profile) -> Advanced -> Character encoding -> Unicode (UTF-8) and "Set LANG environment variable on startup" checked, login sessions have LANG=en_US.UTF-8 defined and LC_CTYPE is not defined at all. Are you sure that isn't begin created by a shell profile somewhere? (I can't check earlier OS X releases at the moment.) That said, I agree that, if OS X accepts "UTF-8" as a valid locale, the locale module should, too.

    @ronaldoussoren
    Copy link
    Contributor

    I didn't get this on my previous system (which was basically a 10.4 system updated through 10.5, 10.7, ..., to 10.9), but did get it on my current system, which has a fresh 10.9 install where I did not use the migration assistent to migrate settings.

    Thus for me to get the behavior with LC_CTYPE:

    • New system with OSX 10.9 pre-installed
    • Select "English" as the primary language
    • Start Terminal.app and inspect the environment

    I have not tried to reproduce this in a VM.

    BTW. I have the same system settings a you.

    @ronaldoussoren
    Copy link
    Contributor

    With the following C code:

    #include <locale.h>
    #include <stdio.h>
    
    int main(void)
    {
    	char* res = setlocale(LC_CTYPE, "UTF-8");
    	printf("Result: %s\n", res);
    
    	res = setlocale(LC_CTYPE, "UTF-9");
    	printf("Result: %s\n", res);
    	return 0;
    }
    /* EOF */

    I get the following output:

    Result: UTF-8
    Result: (null)

    That is, UTF-8 is a valid locale for LC_CTYPE, and as expected some other string isn't.

    BTW. "UTF-8" is only a valid locale for LC_CTYPE, not for other categories (when you change LC_CTYPE to LC_ALL both calls return NULL).

    @bitdancer
    Copy link
    Member

    That is seriously broken on Apple's part. But I guess we have no choice but to emulate their bug.

    @ned-deily
    Copy link
    Member

    I've looked at this a bit, primarily on OS X 10.9 Mavericks, although I expect mostly similar behavior on older recent releases of OS X. On 10.9, the setting of locale variables is done by whatever program is used to launch a shell. I looked at the behavior of the built-in Terminal.app, the third-party iTerm2.app, the MacPorts distribution of xterm, and the built-in sshd. By default, the latter two do not set any locale env variables. Both Terminal.app and iTerm2.app set either LANG or LC_CTYPE based on the user's settings for "Region" and "Preferred Language" in the "System Preferences" -> "Language & Region" control panel. Three examples:

    1. "Region" = "United States", "Preferred Language" = "English":
      -> LANG=en_US.UTF-8

    2. "Region" = "Germany", "Preferred Language" = "German"
      -> LANG=de_DE.UTF-8

    3. "Region" = "Germany", "Preferred Language" = "English"
      -> LC_CTYPE= "UTF-8"

    So it is almost certainly the last case that is under discussion here. Whether or not that is a bug is not as clear as it might seem at first. BSD implementations of locale differ from the GNU Linux version. Both FreeBSD and OS X define a "UTF-8" locale that has only one locale category defined in it: LC_CTYPE. It appears to be a fallback locale used when there is no applicable region / language combination, in this case no "en_DE*" locales.

    $ ls /usr/share/locale/UTF*
    LC_CTYPE

    Compare with the en_US* locales:

    $ ls /usr/share/locale/en_US*
    /usr/share/locale/en_US:
    LC_COLLATE  LC_CTYPE    LC_MESSAGES LC_MONETARY LC_NUMERIC  LC_TIME

    /usr/share/locale/en_US.ISO8859-1:
    LC_COLLATE LC_CTYPE LC_MESSAGES LC_MONETARY LC_NUMERIC LC_TIME

    /usr/share/locale/en_US.ISO8859-15:
    LC_COLLATE LC_CTYPE LC_MESSAGES LC_MONETARY LC_NUMERIC LC_TIME

    /usr/share/locale/en_US.US-ASCII:
    LC_COLLATE LC_CTYPE LC_MESSAGES LC_MONETARY LC_NUMERIC LC_TIME

    /usr/share/locale/en_US.UTF-8:
    LC_COLLATE LC_CTYPE LC_MESSAGES LC_MONETARY LC_NUMERIC LC_TIME

    Now as I read the current POSIX standard, there is nothing wrong with this. AFAICT, the standard places no restriction on the format of locale names, in particular, it does not mandate that they conform to RFC 1766 or its successors. Further, the standard provides for implementation-specific locales (other than the mandatory "POSIX" aka "C" locale) and some platforms provide tools to create custom locales, e.g. mklocale(1) on FreeBSD and OS X, localedef(1) on GNU Linux. So I wonder if the locale module should really be imposing its own restrictions on locale names as it does currently.

    From IEEE Std 1003.1, 2013 Edition:
    "The capability to specify additional locales to those provided by an implementation is optional, denoted by the _POSIX2_LOCALEDEF symbol. If the option is not supported, only implementation-supplied locales are available. Such locales shall be documented using the format specified in this section. [...] The locale definition file shall contain one or more locale category source definitions, and shall not contain more than one definition for the same locale category. [...] In the event that some of the information for a locale category, as specified in this volume of POSIX.1-2008, is missing from the locale source definition, the behavior of that category, if it is referenced, is unspecified."

    There is a further complication for OS X. Apple provides a richer native API for locales, CFLocale (and its Cocoa equivalent, NSLocale). So some nuances may get lost in the imperfect mapping between CFLocale and the conventional LC_* environment variables and between them and Python. We could look at trying to support the native APIs as well.

    http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07
    https://developer.apple.com/library/mac/documentation/CoreFoundation/Conceptual/CFLocales/CFLocales.html
    https://developer.apple.com/library/mac/documentation/CoreFoundation/Reference/CFLocaleRef/Reference/reference.html

    @barry-scott
    Copy link
    Mannequin

    barry-scott mannequin commented Mar 29, 2015

    Mac OS X use the __CF_USER_TEXT_ENCODING env var to setup the locale in for native libraries.

    I found that for GUI python code I needed to convert the value in __CF_USER_TEXT_ENCODING into a suitable call to setlocale().

    The code I use is attached to bpo-23797.

    @ronaldoussoren
    Copy link
    Contributor

    1. I agree with Ned that the OSX behavior is not broken, it is different but within spec. Python makes assumption about the format of locale names that aren't universally valid.

    2. We should be careful in using CFLocale. Those APIs are part of CoreFoundation and CoreFoundation APIs cannot be used in the child proces after calling os.fork.

    As an aside to 2), CoreFoundation and any other Apple "Cocoa" frameworks should be assumed to use threads and hence the comment about threads in the fork specification (link below) apply, and currently Apple doesn't appear to use pthread_atfork to make sure library state is valid in child processes after fork.

    <http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html\>

    @ronaldoussoren
    Copy link
    Contributor

    Dimitry's patch looks good, I added my patch before checking if there already is patch.

    The only thing that might be cause discussion is when to accept 'UTF-8' as a valid locale name. My patch only accepts in on OSX, while Dimitry's patch accepts it everywwhere.

    Writing this I'm slightly in favour of Dimitry's approach: I quite often run into problems when using SSH to log in to a Linux box from my OSX laptop (with LC_CTYPE=UTF-8). Almost everything works correctly, except for Python code that uses the locale module (which craps out with the exception in the first message in this issue).

    IMHO Dimitry's patch should be applied as is.

    @ronaldoussoren
    Copy link
    Contributor

    ping...

    I think the current behavior is a bug in Python and should be fixed in 2.7, 3.4, 3.5 and default (using Dmitry's patch).

    I'd like to commit the patch, but would like someone else's review of the patch before doing so.

    @serhiy-storchaka
    Copy link
    Member

    Needed tests.

    With the patch:

    $ LC_CTYPE=UTF-8 ./python
    >>> import locale
    >>> locale.getdefaultlocale()
    (None, 'UTF-8')
    >>> locale.getpreferredencoding()
    'ANSI_X3.4-1968'
    >>> locale.getlocale()
    (None, None)
    
    $ LC_CTYPE=en_US_UTF-8 ./python
    >>> import locale
    >>> locale.getdefaultlocale()
    ('en_US', 'UTF-8')
    >>> locale.getpreferredencoding()
    'UTF-8'
    >>> locale.getlocale()
    ('en_US', 'UTF-8')

    I think getpreferredencoding() and getlocale() should return the UTF-8 encoding.

    @serhiy-storchaka
    Copy link
    Member

    Perhaps the better way to solve this issue is to use aliases table. What is the LC_CTYPE environment variable set when the default language set to non-English? How different native MacOS X command-line programs behave when set LC_CTYPE to other encoding (e.g. ASCII, US-ASCII, ISO8859-1, ISO-8859-1, Latin1)? What if set it to UTF8 (no minus) or utf-8 (lower case)?

    @ronaldoussoren
    Copy link
    Contributor

    The only locale that doesn't include language information is the UTF-8 one, there is no locale named "US-ASCII".

    See /usr/share/locale on an OSX system.

    PS. The more I look at locale.py the more problems I find with it. The code makes a unwarranted assumptions about locales that aren't actually true on all systems.

    For example:

    >>> locale.normalize('ja_JP')
    'ja_JP.eucJP'

    That's not true on OSX, /usr/share/locale/ja_JP/LC_CTYPE is a symlink to /usr/share/locale/UTF-8/LC_CTYPE.

    AFAIK *all* locale's on OSX use UTF-8.

    @ronaldoussoren
    Copy link
    Contributor

    The alias mechanism cannot be used because LC_CTYPE=UTF-8 as the locale doesn't imply anything about languages.

    In Linux terms it is more or less equal to "C.UTF-8" or "POSIX.UTF-8", except that those two aren't valid locales on OSX.

    @ronaldoussoren
    Copy link
    Contributor

    Testing this is interesting to say the least due to the dynamic way the module interface is built.

    Serhiy: are you testing on a Linux machine? On my machine getpreferredencoding() returns 'UTF-8' because it hits the CODESET path (which ends up calling _locale.nl_langinfo(_locale.CODESET) and that returns UTF-8).

    @ronaldoussoren
    Copy link
    Contributor

    I've attached a patch with more tests, but I'm not to happy about the new test because it too much of a white box test and is therefore fairly fragile w.r.t. the actual implementation of the module.

    @serhiy-storchaka
    Copy link
    Member

    Yes, I were testing on a Linux machine and forgot that results are OS depending.

    I agree, that test should less depend on implementation details. As far as _locale._getdefaultlocale is defined only on Windows and "UTF-8" is not valid locale on Windows, I think there is no need to patch _locale for testing. But getlocale() and getpreferredencoding() should be consistent with getdefaultlocale() (and getlocale() is yet one way to test private function _parse_localename()). setlocale() should work with the result of getlocale() and getdefaultlocale(). Are following tests passed on OSX?

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented Apr 18, 2016

    ping?

    Just ran into this issue on OS X El Capitan with Region set to Germany and Language to English. Just as Ned pointed out 2 years ago, this results in LC_CTYPE set to 'UTF-8' in the terminal and docutils still can't cope with it.

    @Kentzo
    Copy link
    Mannequin

    Kentzo mannequin commented Jun 14, 2016

    Could someone provide a patch for Python 3.5?

    @karolyi
    Copy link
    Mannequin

    karolyi mannequin commented Oct 12, 2016

    OSX Sierra + Python, the bug still exists.

    subscribing

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented Jan 12, 2017

    To me this issue seems quite related to PEP-538. Maybe the LC_CTYPE coercion proposed in the PEP could be extended to cover the case of LC_CTYPE=UTF-8?

    @wm75 wm75 mannequin added the 3.7 (EOL) end of life label Jan 12, 2017
    @ncoghlan
    Copy link
    Contributor

    PEP-538 wouldn't help here, as there's nothing wrong with CPython's assumptions about the text encoding to use for operating system interfaces - it's assuming UTF-8 (because it's Mac OS X) and that assumption is correct (because it's Mac OS X).

    The problem appears to be that locale.py was written primarily for Linux, and hence makes assumptions that aren't valid on BSD and Mac OS X.

    Dmitry's suggested solution of taking the BSD/Mac OS X specific locale of "UTF-8" and universally accepting it as meaning (None, "UTF-8") sems like a sensible step forward, even if it doesn't resolve all the discrepancies.

    Where PEP-538 and PEP-540 would come into play is when this setting gets forwarded over SSH to Linux servers (as then CPython *will* get the nominal system text encoding wrong), but that's independent of getting the locale module to handle it more gracefully.

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented Jan 12, 2017

    I think PEP-538 extended to the UTF-8 locale *would* help here. Specifically, it would coerce only LC_CTYPE to en_US.UTF-8 (unless OS X has C.UTF-8), which I guess is good enough for the purpose here.

    I do agree that it is not the kind of problem that PEP-538 tries to solve right now, but it could be extended to cover other types of problematic locales like this one. Just wanted to make you aware of this possibility.

    @malemburg
    Copy link
    Member

    I think Ronald's patch bpo-18378-2015-07-25-py36.txt with added darwin check would be the best way forward.

    In the current form, it would allow using 'UTF-8' as locale string on all platforms - which is not such a good idea.

    @ncoghlan
    Copy link
    Contributor

    SSH environment forwarding will propagate this "LC_CTYPE=UTF-8" setting from Mac OS X clients to Linux servers.

    At present, that breaks in multiple ways, as CPython will interpret it as being the "C" locale (since Linux servers don't offer a "UTF-8" locale, even when they do offer "C.UTF-8")

    PEPs 538 and 540 aim to help CPython itself to deal with that case, but that won't be sufficient to help code that tries to pass the nominal LC_CTYPE setting to the locale module.

    Accepting "UTF-8" and interpreting it as functionally equivalent to C.UTF-8 will mean that this setting will at least work as desired on servers that offer C.UTF-8.

    @malemburg
    Copy link
    Member

    On 13.01.2017 04:47, Nick Coghlan wrote:

    Accepting "UTF-8" and interpreting it as functionally equivalent to C.UTF-8 will mean that this setting will at least work as desired on servers that offer C.UTF-8.

    I don't think that's within the scope of this patch. "UTF-8" is not
    a valid locale setting on Linux and so Python should not allow
    passing this through the locale normalization process on Linux.

    Please also note that SSH does not forward arbitrary env vars.
    Only a select few are forwarded and all others have to be
    configured. The locale vars are not among the default ones
    (see the ssh man page for details).

    Aisde: While looking into this I found that the locale module
    aliases C.UTF-8 to en_US.UTF-8. This was added as part of
    issue bpo-20076 and originates from the X.org locale.alias file.
    Time machine and all that :-)

    @mattheww
    Copy link
    Mannequin

    mattheww mannequin commented Jun 25, 2017

    That alias (C.UTF-8 to en_US.UTF-8) is surely a bug in itself nowadays. I've filed bpo-30755 .

    @rfrail3
    Copy link
    Mannequin

    rfrail3 mannequin commented Jan 10, 2019

    I still have this issue on MacOS Mojave 10.14

    Python 3.7.2 (default, Dec 27 2018, 07:35:06)
    [Clang 10.0.0 (clang-1000.11.45.5)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import locale
    >>> locale.getdefaultlocale()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/locale.py", line 568, in getdefaultlocale
        return _parse_localename(localename)
      File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/locale.py", line 495, in _parse_localename
        raise ValueError('unknown locale: %s' % localename)
    ValueError: unknown locale: UTF-8
    >>>
    $ locale
    LANG=
    LC_COLLATE="C"
    LC_CTYPE="UTF-8"
    LC_MESSAGES="C"
    LC_MONETARY="C"
    LC_NUMERIC="C"
    LC_TIME="C"
    LC_ALL=

    @ronaldoussoren
    Copy link
    Contributor

    LC_CTYPE=UTF-8 is a valid configuration on macOS, and is in the default environment when you install a fresh system. This includes the beta's for macOS 10.15 and is therefore unlikely to change anytime soon.

    Interestingly enough I get this error even when I unset the relevant environment variables. For some reason LC_CTYPE is reset when I start the interpreter, even if it is set to something else. This means the usual way of working around this problem no longer works.

    I'll create a pull request with an up-to-date version of my latest patch for further discussion.

    BTW. I'm testing with the current tip of the tree, but 3.7.3 fails in the same way.

    @ronaldoussoren
    Copy link
    Contributor

    As promised there is now a pull request.

    I'd love a review (and a change to approve the pull request when reviewers are happy, I'm trying to get back into actively contributing).

    ---

    I now understand why locale.getdefaultlocale() fails even when LC_CTYPE is not set: pylifecycle sets LC_CTYPE to UTF-8 in the UTF-8 coercion code.

    @ned-deily
    Copy link
    Member

    New changeset b0caf32 by Ned Deily (Ronald Oussoren) in branch 'master':
    bpo-18378: Recognize "UTF-8" as a valid name in locale._parse_localename (GH-14736)
    b0caf32

    @miss-islington
    Copy link
    Contributor

    New changeset 554143e by Miss Islington (bot) in branch '3.7':
    bpo-18378: Recognize "UTF-8" as a valid name in locale._parse_localename (GH-14736)
    554143e

    @miss-islington
    Copy link
    Contributor

    New changeset e471a54 by Miss Islington (bot) in branch '3.8':
    bpo-18378: Recognize "UTF-8" as a valid name in locale._parse_localename (GH-14736)
    e471a54

    @ned-deily
    Copy link
    Member

    Ronald's PR 14738 LGTM. I merged it to master and backported for 3.8.0b4 and 3.7.5. Thanks, everyone!

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants