This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: python -v crashes in nonencodable directory
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.6, Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Arfrever, brett.cannon, eric.snow, ncoghlan, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2015-09-19 20:48 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
stdprinter_backslashreplace.patch serhiy.storchaka, 2015-09-30 09:22 review
Messages (12)
msg251115 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-19 20:48
$ pwd
/home/serhiy/py/cpy�thon-3.5
$ ./python -v
import _frozen_importlib # frozen
import _imp # builtin
import sys # builtin
import '_warnings' # <class '_frozen_importlib.BuiltinImporter'>
import '_thread' # <class '_frozen_importlib.BuiltinImporter'>
import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
import '_frozen_importlib_external' # <class '_frozen_importlib.FrozenImporter'>
import '_io' # <class '_frozen_importlib.BuiltinImporter'>
import 'marshal' # <class '_frozen_importlib.BuiltinImporter'>
import 'posix' # <class '_frozen_importlib.BuiltinImporter'>
import _thread # previously loaded ('_thread')
import '_thread' # <class '_frozen_importlib.BuiltinImporter'>
import _weakref # previously loaded ('_weakref')
import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
# installing zipimport hook
import 'zipimport' # <class '_frozen_importlib.BuiltinImporter'>
# installed zipimport hook
Fatal Python error: Py_Initialize: Unable to get the locale encoding
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
# destroy io
  File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
# destroy io
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
# destroy io
  File "<frozen importlib._bootstrap_external>", line 658, in exec_module
# destroy io
  File "<frozen importlib._bootstrap_external>", line 759, in get_code
# destroy io
  File "<frozen importlib._bootstrap_external>", line 368, in _verbose_message
# destroy io
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 21: surrogates not allowed
# destroy encodings
Aborted (core dumped)
msg251122 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-09-19 22:10
And what happens if you leave -v off? Since the failure is in Py_Initialize I want to know if that Py_FatalError trigger is avoided without -v.

A possible fix to test is to simply modify importlib._bootstrap._verbose_message to catch UnicodeDecodeError and then print some message saying that there was some undecodable string and just swallow the exception. I just don't know if this is in the right place to prevent Py_Initialize from erroring out.
msg251123 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-19 22:40
python without -v is not failed.

If wrap message in _bootstrap_external._verbose_message with '!%a'% and in _bootstrap._verbose_message with '%a'% (why _verbose_message is duplicated in _bootstrap and _bootstrap_external?), the output is:

...
# installing zipimport hook
@"import 'zipimport' # <class '_frozen_importlib.BuiltinImporter'>"
# installed zipimport hook
!'# /home/serhiy/py/cpy\udcffthon-3.5/Lib/encodings/__pycache__/__init__.cpython-35.pyc matches /home/serhiy/py/cpy\udcffthon-3.5/Lib/encodings/__init__.py'
!"# code object from '/home/serhiy/py/cpy\\udcffthon-3.5/Lib/encodings/__pycache__/__init__.cpython-35.pyc'"
!'# /home/serhiy/py/cpy\udcffthon-3.5/Lib/__pycache__/codecs.cpython-35.pyc matches /home/serhiy/py/cpy\udcffthon-3.5/Lib/codecs.py'
!"# code object from '/home/serhiy/py/cpy\\udcffthon-3.5/Lib/__pycache__/codecs.cpython-35.pyc'"
@"import '_codecs' # <class '_frozen_importlib.BuiltinImporter'>"
@"import 'codecs' # <_frozen_importlib_external.SourceFileLoader object at 0xb70b9aac>"
!'# /home/serhiy/py/cpy\udcffthon-3.5/Lib/encodings/__pycache__/aliases.cpython-35.pyc matches /home/serhiy/py/cpy\udcffthon-3.5/Lib/encodings/aliases.py'
!"# code object from '/home/serhiy/py/cpy\\udcffthon-3.5/Lib/encodings/__pycache__/aliases.cpython-35.pyc'"
@"import 'encodings.aliases' # <_frozen_importlib_external.SourceFileLoader object at 0xb70c81ac>"
@"import 'encodings' # <_frozen_importlib_external.SourceFileLoader object at 0xb70b96cc>"
!'# /home/serhiy/py/cpy\udcffthon-3.5/Lib/encodings/__pycache__/utf_8.cpython-35.pyc matches /home/serhiy/py/cpy\udcffthon-3.5/Lib/encodings/utf_8.py'
!"# code object from '/home/serhiy/py/cpy\\udcffthon-3.5/Lib/encodings/__pycache__/utf_8.cpython-35.pyc'"
@"import 'encodings.utf_8' # <_frozen_importlib_external.SourceFileLoader object at 0xb70ccd2c>"
@"import '_signal' # <class '_frozen_importlib.BuiltinImporter'>"
!'# /home/serhiy/py/cpy\udcffthon-3.5/Lib/encodings/__pycache__/latin_1.cpython-35.pyc matches /home/serhiy/py/cpy\udcffthon-3.5/Lib/encodings/latin_1.py'
!"# code object from '/home/serhiy/py/cpy\\udcffthon-3.5/Lib/encodings/__pycache__/latin_1.cpython-35.pyc'"
@"import 'encodings.latin_1' # <_frozen_importlib_external.SourceFileLoader object at 0xb70cf54c>"
!'# /home/serhiy/py/cpy\udcffthon-3.5/Lib/__pycache__/io.cpython-35.pyc matches /home/serhiy/py/cpy\udcffthon-3.5/Lib/io.py'
!"# code object from '/home/serhiy/py/cpy\\udcffthon-3.5/Lib/__pycache__/io.cpython-35.pyc'"
...

Verbose non-ascii message is written before importing codecs.
msg251918 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-30 09:22
Before importing the io module sys.stderr is stdprinter. It always encodes written string to UTF-8. Proposed patch makes it to use the backslashreplace error handler.

In future perhaps we could implement stdprinter in Python.
msg251921 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-30 10:07
> Before importing the io module sys.stderr is stdprinter. It always encodes written string to UTF-8. Proposed patch makes it to use the backslashreplace error handler.

I like this solution. stdprinter is supposed to be replaced quickly.

But we may need something else to escape non-encodable characters in the filename when sys.stdout is a TextIOWrapper using the strict error handler.
msg251924 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-30 12:52
New changeset 6347b154dd67 by Serhiy Storchaka in branch '3.4':
Issue #25182: The stdprinter (used as sys.stderr before the io module is
https://hg.python.org/cpython/rev/6347b154dd67

New changeset e8b6c6c433a4 by Serhiy Storchaka in branch '3.5':
Issue #25182: The stdprinter (used as sys.stderr before the io module is
https://hg.python.org/cpython/rev/e8b6c6c433a4

New changeset 0b0945c8de36 by Serhiy Storchaka in branch 'default':
Issue #25182: The stdprinter (used as sys.stderr before the io module is
https://hg.python.org/cpython/rev/0b0945c8de36
msg251925 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-30 12:54
Thank you for the review Victor.

> But we may need something else to escape non-encodable characters in the filename when sys.stdout is a TextIOWrapper using the strict error handler.

This is not related to this issue. sys.stderr uses backslashreplace.
msg251926 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-30 12:59
> This is not related to this issue. sys.stderr uses backslashreplace.

Ok, fine.

---
+    _errno = errno;
     Py_END_ALLOW_THREADS
+    Py_XDECREF(bytes);

     if (n < 0) {
-        if (errno == EAGAIN)
+        if (_errno == EAGAIN)
             Py_RETURN_NONE;
         PyErr_SetFromErrno(PyExc_IOError);
---

Hum, if you expect that _errno can be modified by Py_XDECREF(bytes), you must restore the previous errno value before calling PyErr_SetFromErrno(). This strategy is used in Python/fileutils.c.
msg251927 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-30 13:04
New changeset 2652c1798f7d by Victor Stinner in branch '3.4':
Issue #25182: Fix compilation on Windows
https://hg.python.org/cpython/rev/2652c1798f7d

New changeset 0eb26a4d5ffa by Victor Stinner in branch '3.5':
(Merge 3.4) Issue #25182: Fix compilation on Windows
https://hg.python.org/cpython/rev/0eb26a4d5ffa

New changeset d1090d733d39 by Victor Stinner in branch 'default':
(Merge 3.5) Issue #25182: Fix compilation on Windows
https://hg.python.org/cpython/rev/d1090d733d39
msg251928 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-30 13:07
Hum, the code didn't compile anymore on Windows. I took the opportunity to fix the errno issue that I saw.

Note: In fact, Python/fileutils.c is a a little bit different. Functions like _Py_write() save errno to restore it later because the caller expects errno to be set.
msg251940 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-30 13:36
> Hum, the code didn't compile anymore on Windows. I took the opportunity to
> fix the errno issue that I saw.

Thank you Victor.
msg251957 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-09-30 17:45
> > This is not related to this issue. sys.stderr uses backslashreplace.
> Ok, fine.

But is related to issue25183.
History
Date User Action Args
2022-04-11 14:58:21adminsetgithub: 69369
2015-09-30 17:45:29serhiy.storchakasetmessages: + msg251957
2015-09-30 13:36:15serhiy.storchakasetmessages: + msg251940
2015-09-30 13:07:33vstinnersetmessages: + msg251928
2015-09-30 13:04:28python-devsetmessages: + msg251927
2015-09-30 12:59:05vstinnersetmessages: + msg251926
2015-09-30 12:54:21serhiy.storchakasetstatus: open -> closed
messages: + msg251925

assignee: serhiy.storchaka
resolution: fixed
stage: patch review -> resolved
2015-09-30 12:52:41python-devsetnosy: + python-dev
messages: + msg251924
2015-09-30 10:07:31vstinnersetnosy: + vstinner
messages: + msg251921
2015-09-30 09:23:00serhiy.storchakasetfiles: + stdprinter_backslashreplace.patch
messages: + msg251918

components: + Interpreter Core
keywords: + patch
stage: patch review
2015-09-21 12:02:51Arfreversetnosy: + Arfrever
2015-09-19 22:40:19serhiy.storchakasetmessages: + msg251123
2015-09-19 22:10:41brett.cannonsetmessages: + msg251122
2015-09-19 21:13:23serhiy.storchakalinkissue25181 dependencies
2015-09-19 20:48:46serhiy.storchakacreate