This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.stat raises exception when using unicode and no locale is set
Type: behavior Stage: resolved
Components: Versions: Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, iritkatriel, ncoghlan, r.david.murray, sejvlond, vstinner
Priority: normal Keywords:

Created on 2015-12-15 08:33 by sejvlond, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (5)
msg256446 - (view) Author: Ondrej Sejvl (sejvlond) Date: 2015-12-15 08:33
os.stat() raises exception UnicodeEncodeError when path is unicode and no locale is set in envinronment (this occures when running app with daemon tools -> LC_ALL=)

How to simulate:
$ env -i python
>>> import os
>>> os.stat(u"\xf0")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf0' in position 0: ordinal not in range(128)

Is this a valid behaviour? Then maybe some notification in documentation would be nice (I am using os.path.isfile and now UnicodeEncodeError raised...)

Thanks
Ondra
msg256460 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-12-15 14:33
I don't think this is "intentional", but instead is a consequence of how things used to work before we started to try to handle empty/C locale better.  It applies to python3 as well.  Since we only started trying to handle this in python3, I'm not sure that it can be fixed in python2.
msg400328 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-08-26 09:32
It's doing this now, so seems like it has been fixed:

% env -i ./python.exe
Python 3.11.0a0 ...
>>> import os
>>> os.stat(u"\xf0")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'ð'
>>>
msg400338 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-08-26 11:55
> It's doing this now, so seems like it has been fixed

Yes. In POSIX systems since Python 3.7, if the LC_CTYPE locale is the legacy "C" or "POSIX" locale, by default it tries to coerce LC_CTYPE to "C.UTF-8", "C.utf8", or "UTF-8". If coercion fails or is disabled (e.g. by defining LC_ALL), the interpreter will still use UTF-8 for the filesystem encoding if UTF-8 mode isn't disabled. If UTF-8 mode is also disabled, then ASCII is used. For example:

    $ LC_CTYPE=C PYTHONCOERCECLOCALE= PYTHONUTF8= python -c 'import sys; print(sys.getfilesystemencoding())'
    utf-8
    $ LC_CTYPE=C PYTHONCOERCECLOCALE=0 PYTHONUTF8= python -c 'import sys; print(sys.getfilesystemencoding())'
    utf-8
    $ LC_CTYPE=C PYTHONCOERCECLOCALE=0 PYTHONUTF8=0 python -c 'import sys; print(sys.getfilesystemencoding())'
    ascii
msg400623 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-08-30 15:31
> It's doing this now, so seems like it has been fixed: % env -i ./python.exe (...)

Right. It's correct to close the issue.

The PEP 540 added a UTF-8 Mode. This mode is enabled if Python is started with the "C" or "POSIX" locale (LC_CTYPE category). If the UTF-8 Mode is enabled, Python uses UTF-8 for its "filesystem" encoding:

* https://docs.python.org/dev/library/os.html#python-utf-8-mode
* https://docs.python.org/dev/glossary.html#term-filesystem-encoding-and-error-handler

Moreover, the PEP 538 also tries to use a UTF-8 variable of "C" and "POSIX" locales, which also fix this issue.

I documented how Python configures its "filesystem encoding" at:
https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.filesystem_encoding
History
Date User Action Args
2022-04-11 14:58:24adminsetgithub: 70054
2021-08-30 15:31:09vstinnersetmessages: + msg400623
2021-08-26 11:55:03eryksunsetstatus: open -> closed

nosy: + eryksun
messages: + msg400338

resolution: fixed
stage: needs patch -> resolved
2021-08-26 09:32:16iritkatrielsetnosy: + iritkatriel
messages: + msg400328
2015-12-15 14:33:40r.david.murraysetversions: + Python 3.5, Python 3.6
nosy: + r.david.murray, ncoghlan, vstinner

messages: + msg256460

type: behavior
stage: needs patch
2015-12-15 08:33:11sejvlondcreate