This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Encoding error running in subprocess with captured output
Type: behavior Stage: resolved
Components: Interpreter Core, Library (Lib), Unicode, Windows Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, ezio.melotti, jkloth, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2018-09-10 01:57 by jkloth, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg324904 - (view) Author: Jeremy Kloth (jkloth) * Date: 2018-09-10 01:57
When running Python via subprocess with captured output an encoding error occurs attempting to output a Unicode filename.  The same does not happen when just using spawnl().

Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys, subprocess
>>> sys.stdout.encoding, sys.stdout.errors
('utf-8', 'surrogateescape')
>>> args = ['-u', '-c', "print('taqdir\\\u0634\u0645\u0627\u0631.py')"]
>>> os.spawnl(os.P_WAIT, sys.executable, '"%s"' % sys.executable, *args)
taqdir\شمار.py
0
>>> subprocess.run([sys.executable, *args], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
CompletedProcess(args=['C:\\Program Files (x86)\\Microsoft Visual Studio\\Shared\\Python36_64\\python.exe', '-u', '-c', "print('taqdir\\شمار.py')"], returncode=1, stdout=b'Traceback (most recent call last):\r\n  File "<string>", line 1, in <module>\r\n  File "C:\\Program Files (x86)\\Microsoft Visual Studio\\Shared\\Python36_64\\lib\\encodings\\cp1252.py", line 19, in encode\r\n    return codecs.charmap_encode(input,self.errors,encoding_table)[0]\r\nUnicodeEncodeError: \'charmap\' codec can\'t encode characters in position 7-10: character maps to <undefined>\r\n')
msg324905 - (view) Author: Jeremy Kloth (jkloth) * Date: 2018-09-10 01:58
Related to issue34421
msg324911 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-09-10 03:55
The interpreter uses the system ANSI codepage for non-console files. In your case this is codepage 1252. In my current setup I can't reproduce this issue since I'm using the new (beta) support in Windows 10 to configure the ANSI codepage as UTF-8 (65001).

You can force standard I/O to use UTF-8 by setting the environment variable PYTHONIOENCODING. Also, if you want the CompletedProcess stdout decoded as text, in 3.6+ you can pass the parameter `encoding='utf-8'`. For example:

    environ = os.environ.copy()
    environ['PYTHONIOENCODING'] = 'utf-8'
    p = subprocess.run([sys.executable, *args], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, env=environ, encoding='utf-8')
    print(p.stdout)
History
Date User Action Args
2022-04-11 14:59:05adminsetgithub: 78799
2022-03-11 18:14:46eryksunlinkissue46988 superseder
2018-09-10 03:55:57eryksunsetstatus: open -> closed

type: behavior

nosy: + eryksun
messages: + msg324911
resolution: not a bug
stage: resolved
2018-09-10 01:58:38jklothsetmessages: + msg324905
2018-09-10 01:58:00jklothcreate