This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: LC_ALL=en_US + io.open() => LookupError: (osx)
Type: crash Stage: resolved
Components: Library (Lib), macOS Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Anthony Sottile, ned.deily, ronaldoussoren, vstinner
Priority: normal Keywords:

Created on 2017-05-10 22:22 by Anthony Sottile, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (7)
msg293457 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2017-05-10 22:22
Originally seen here: https://github.com/Microsoft/vscode/issues/26227

```
$ LC_ALL=en_US python -c 'import io; io.open("/dev/null")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
LookupError: unknown encoding: 
```

Admittedly, that `LC_ALL` looks malformed (should be en_US.UTF-8), but given this works on linux:

```
$ env -i LC_ALL=en_US python -c 'import io; io.open("/dev/null")'
$
```

It may be an OSX specific bug?

I've only tagged py27 + py36 because I did not have a build toolchain available to try on master, though I imagine it is reproducible there as well.
msg293460 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-10 23:11
> It may be an OSX specific bug?

Yes, on Linux, it "just works":

haypo@selma$ LC_ALL=en_US python2 -c 'import io; io.open("/dev/null")'
haypo@selma$ LC_ALL=en_USxxx python2 -c 'import io; io.open("/dev/null")'

In fact, you get ASCII encoding for these two locales:

haypo@selma$ LC_ALL=en_US python2 -c 'import locale; print(locale.getpreferredencoding(False))'
ANSI_X3.4-1968

haypo@selma$ LC_ALL=en_USxxx python2 -c 'import locale; print(locale.getpreferredencoding(False))'
ANSI_X3.4-1968

Internally, io.open() uses locale.getpreferredencoding(False) if you don't specify an encoding.
msg293509 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2017-05-11 16:43
On macOS 10.12:

ronald$ LC_ALL=en_US python2.7 -c 'import locale; print(repr(locale.getpreferredencoding()))'
''
ronald$ LC_ALL=en_US python3.6 -c 'import locale; print(repr(locale.getpreferredencoding()))'
'UTF-8'

getpreferredencoding uses the CODESET path on macOS, with means the result above is explained by this session (python 2.7):

>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'en_US'
>>> locale.nl_langinfo(locale.CODESET)
''

Note that _pyio uses locale.getpreferedencoding(), not locale.getpreferredencoding(False). The latter would use US-ASCII as the encoding:

>>> import locale
>>> locale.nl_langinfo(locale.CODESET)
'US-ASCII'


I guess the empty string for the encoding is explained by the following shell session that looks at the locale information:

$ LC_ALL=en_US.UTF-8 locale -ck LC_ALL | charmap
charmap="UTF-8"

$ LC_ALL=en_US locale -ck LC_ALL | grep charmap
charmap=

In python3 locale.getpreferredencoding (or rather, the same function in _bootlocale) was tweaked to deal with this problem:

if not result and sys.platform == 'darwin':
     # nl_langinfo can return an empty string
     # when the setting has an invalid value.
     # Default to UTF-8 in that case because
     # UTF-8 is the default charset on OSX and
     # returning nothing will crash the
     # interpreter.
     result = 'UTF-8'

Backporting this to 2.7 would IMHO be the best way to fix this issue.
msg293534 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-12 09:29
I created a PR to backport bpo-6393 to Python 2.7: https://github.com/python/cpython/pull/1555
msg293538 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-12 09:53
> Note that _pyio uses locale.getpreferedencoding(), not locale.getpreferredencoding(False).

Oh, it's a difference between Python 2.7 and Python 3.

Python 3 calls setlocale(LC_CTYPE, "") at startup, so locale.getpreferredencoding(False) can be used in Python 3.

Python 2.7 requires to call locale.getpreferredencoding() to get the encoding of the *user* locale.
msg293540 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-12 10:12
Thanks for your bug report Anthony Sottile! It's now fixed!


My backport was merged.

New changeset 94a3694c3dda97e3bcb51264bf47d948c5424d84 by Victor Stinner in branch '2.7':
bpo-6393: Fix locale.getprerredencoding() on macOS (#1555)
https://github.com/python/cpython/commit/94a3694c3dda97e3bcb51264bf47d948c5424d84

Before, without the fix:

macbook:2.7 haypo$ LANG=en_US ./python.exe -c 'import io; print(io.open("setup.py").encoding)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
LookupError: unknown encoding: 


After, with the fix:

macbook:2.7 haypo$ LANG=en_US ./python.exe -c 'import io; print(io.open("setup.py").encoding)'
UTF-8


--

Python 3.7 is not affected:

macbook:master haypo$ LANG=en_US ./python.exe -c 'import locale; print(locale.getpreferredencoding())'
UTF-8

macbook:master haypo$ LANG=en_US ./python.exe -c 'import io; print(io.open("setup.py").encoding)'
UTF-8
msg293541 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-12 10:13
The best "workaround" on Python 2.7 is to specify the encoding: replace io.open(filename) with io.open(filename, encoding="utf8").

It's always better to specify the encoding ;-)
History
Date User Action Args
2022-04-11 14:58:46adminsetgithub: 74523
2017-05-12 10:13:47vstinnersetmessages: + msg293541
2017-05-12 10:12:16vstinnersetstatus: open -> closed
versions: - Python 3.6
messages: + msg293540

resolution: fixed
stage: resolved
2017-05-12 09:53:58vstinnersetmessages: + msg293538
2017-05-12 09:29:05vstinnersetmessages: + msg293534
2017-05-11 16:43:23ronaldoussorensetmessages: + msg293509
2017-05-10 23:11:56vstinnersetnosy: + vstinner
messages: + msg293460
2017-05-10 22:22:18Anthony Sottilecreate