classification
Title: asyncio subprocess accepts string as parameter which lead to UnicodeEncodeError
Type: Stage: resolved
Components: asyncio Versions: Python 3.8, Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, natim, vstinner, yselivanov
Priority: normal Keywords:

Created on 2018-10-18 08:50 by natim, last changed 2018-10-18 12:44 by natim. This issue is now closed.

Files
File name Uploaded Description Edit
demo.py natim, 2018-10-18 10:53
Messages (15)
msg327945 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 08:50
Asyncio.create_subprocess_exec accepts a list of str as parameter which lead to UnicodeEncodeError I think it should accept only bytes shouldn't it?
msg327953 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2018-10-18 09:50
List of strings works on both my local Linux box and CPython test suite.

Please provide more info about the error. Stacktrace can help
msg327954 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-18 09:54
Hi Remy,

> Asyncio.create_subprocess_exec accepts a list of str as parameter which lead to UnicodeEncodeError I think it should accept only bytes shouldn't it?

Can you elaborate? On which OS? What is your error message? Can you paste a traceback?
msg327955 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 10:04
> List of strings works on both my local Linux box and CPython test suite.

Indeed that's why I posted this bug report, in my opinion it should work only with bytes string.

> Can you elaborate? On which OS? What is your error message? Can you paste a traceback?

If you try to send a UTF-8 string on a linux box for instance, you might get a UnicodeEncodeError.

Let me try to provide you with a script to reproduce this error.
msg327962 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 10:53
I though this would be sufficient to actually reproduce the issue.
However it seems that if the system encoding is UTF-8 it does work properly.

Here is the traceback I had:

```
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 69: ordinal not in range(128)
  File "worker.py", line 393, in <module>
    return_code = loop.run_until_complete(main(loop))
  File "asyncio/base_events.py", line 467, in run_until_complete
    return future.result()
  File "worker.py", line 346, in main
    '-f mp4', '-o', '{}/{}.mp4'.format(download_tempdir, video_id))
  File "worker.py", line 268, in run_command
    proc = await create
  File "asyncio/subprocess.py", line 225, in create_subprocess_exec
    stderr=stderr, **kwds)
  File "asyncio/base_events.py", line 1191, in subprocess_exec
    bufsize, **kwargs)
  File "asyncio/unix_events.py", line 191, in _make_subprocess_transport
    **kwargs)
  File "asyncio/base_subprocess.py", line 39, in __init__
    stderr=stderr, bufsize=bufsize, **kwargs)
  File "asyncio/unix_events.py", line 697, in _start
    universal_newlines=False, bufsize=bufsize, **kwargs)
  File "python3.6/subprocess.py", line 707, in __init__
    restore_signals, start_new_session)
  File "python3.6/subprocess.py", line 1267, in _execute_child
    restore_signals, start_new_session, preexec_fn)
```
msg327964 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 11:03
I am adding the following info:

If I run the following on the Docker image where I got the error I get:

```
import sys
import locale

print(sys.getdefaultencoding())
print(locale.getpreferredencoding())
```

utf-8
ANSI_X3.4-1968

While if I run it on my machine I get:

utf-8
UTF-8

I don't know how to force the usage of the later locally to reproduce.

Settings LC_ALL=C and LANG=C didn't do the trick
msg327965 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 11:06
Here we go:

```
$ python3.7 demo.py 
utf-8
UTF-8
Traceback (most recent call last):
  File "demo.py", line 21, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 568, in run_until_complete
    return future.result()
  File "demo.py", line 14, in main
    sys.stdout.write(out.decode('utf-8'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1: ordinal not in range(128)
```
msg327966 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2018-10-18 11:07
I think you'll get the same error on `subprocess.run()` call if your current locale is not UTF-8.

I don't recall the details but the Intenet has a lot info about setting locale per user and system-wide.
msg327967 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 11:08
I believe Python 3.7 brings explicit unicode encoding/decoding.

If depending on the environment the create_subprocess_exec method can fail, I believe we should not try to encode the command lines attribute but rather enforce it to be bytes.
msg327970 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-18 11:57
I added the UTF-8 Mode for you, for the Docker use case: python3.7 -X utf8. Using that, Python ignores your locale and speaks UTF-8.

What is your locale? Try the "locale" command.
msg327974 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 12:40
Here are the locale set:

```
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
```
msg327975 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-18 12:41
> LC_CTYPE="POSIX"

I modified Python 3.7.1 to enable the UTF-8 Mode when the LC_CTYPE is "POSIX". In Python 3.7.0, the UTF-8 Mode is only enabled if the LC_CTYPE is "C".
msg327976 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 12:43
Unicode is complicated, the answer is somewhere here: https://unicodebook.readthedocs.io/

Sorry for the bothering, I thought it was a bug but apparently it's a feature. Thank you for your help, thank you for making Python better.
msg327977 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-18 12:44
This issue is not an asyncio bug: the bug occurs in subprocess.

The bug is not a subprocess bug: subprocess works as expected, it encodes Unicode with sys.getfilesystemencoding() (see os.fsencode()).

The bug is that you use non-ASCII strings whereas your filesystem encoding is ASCII.

You have a different options to fix *your* issue:

* Use a different locale which uses a UTF-8 locale
* Enable the Python 3.7 UTF-8 mode
* Wait for Python 3.7.1 (which enables automatically the UTF-8 Mode for LC_CTYPE="POSIX")

Note: You might want to read my ebook http://unicodebook.readthedocs.io/ which explains how to deal with Unicode.
msg327978 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 12:44
> I modified Python 3.7.1 to enable the UTF-8 Mode when the LC_CTYPE is "POSIX". In Python 3.7.0, the UTF-8 Mode is only enabled if the LC_CTYPE is "C"

Ok works for me thanks :)
History
Date User Action Args
2018-10-18 12:44:24natimsetmessages: + msg327978
2018-10-18 12:44:22vstinnersetmessages: + msg327977
2018-10-18 12:43:29natimsetstatus: open -> closed
resolution: not a bug
messages: + msg327976

stage: resolved
2018-10-18 12:41:17vstinnersetmessages: + msg327975
2018-10-18 12:40:04natimsetmessages: + msg327974
2018-10-18 11:57:42vstinnersetmessages: + msg327970
2018-10-18 11:08:40natimsetmessages: + msg327967
2018-10-18 11:07:30asvetlovsetmessages: + msg327966
2018-10-18 11:06:49natimsetmessages: + msg327965
2018-10-18 11:03:03natimsetmessages: + msg327964
2018-10-18 10:53:44natimsetfiles: + demo.py

messages: + msg327962
2018-10-18 10:04:25natimsetmessages: + msg327955
2018-10-18 09:54:30vstinnersetnosy: + vstinner
messages: + msg327954
2018-10-18 09:50:45asvetlovsetmessages: + msg327953
2018-10-18 08:50:17natimcreate