This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: asyncio.create_subprocess_* should honor `encoding`
Type: enhancement Stage: resolved
Components: asyncio Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Docs: asyncio.loop.subprocess_exec documentation is confusing, it's not clear how to inherit stdin, stdout or stderr in the subprocess
View: 36686
Assigned To: Nosy List: Dima.Tisnek, adampl, asvetlov, njs, unused-346748, yselivanov
Priority: normal Keywords:

Created on 2017-07-31 05:01 by unused-346748, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
asyncio_encoding_test.py adampl, 2018-04-11 14:31 another demo
Messages (7)
msg299537 - (view) Author: Alex (unused-346748) Date: 2017-07-31 05:01
Regardless of the value of `encoding`, StreamReaders returned for the `asyncio.subprocess.Process`'s `stdout` and `stderr` would be in binary mode.

    import sys
    import asyncio
    import subprocess

    async def main():
        sp = await asyncio.create_subprocess_exec('ls', '-la',
                                                  stdin=None,
                                                  stdout=subprocess.PIPE,
                                                  encoding='utf8')
        await sp.wait()
        data = await sp.stdout.read()
        print(data)
        print(isinstance(data, bytes))

    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

There are two naive solutions:

- `create_subprocess_*` could explicitly raise when provided any `encoding` and `errors` (like it now does with `universal_newlines`).

- or implement encoding conversions for StreamReaders (and StreamWriters), and forward those.

Of course it would be better to have conversions, but I don't know how those will have to deal with decoding partially available data - e.g. returning in the middle of surrogate pair, or having only half a codepoint for UTF16. There should likely cache partial data to process at the next read, but this would require rewriting codecs to support partial decode... seems like it's not that easy :(
msg310223 - (view) Author: Dima Tisnek (Dima.Tisnek) * Date: 2018-01-18 06:28
I'd love to see universal_newlines=True in asyncio.subprocess.
msg310224 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2018-01-18 06:51
Python's codecs already support partial decode, exactly to handle this kind of case. See codecs.getincremental{en,de}coder and

https://docs.python.org/3/library/codecs.html#incremental-encoding-and-decoding
msg315192 - (view) Author: Adam (adampl) * Date: 2018-04-11 14:31
It occurs both on Python 3.6 and 3.7 RC, so maybe it should be fixed in the 3.7 release.
msg315205 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2018-04-11 23:01
@adampl: The first step is for someone (possibly you :-)) to write a patch and submit it as a PR against the Python master branch. Then once the actual change is figured out, we can have the discussion about which releases to backport it to.
msg315336 - (view) Author: Adam (adampl) * Date: 2018-04-15 17:34
After reading the docs more carefully, it's now plain to me that text encoding is not supported yet, so actually it's not a bug :)

However the docs should be improved (and then an assertion could be added too) to prevent people from falling into this trap. Only the `universal_newlines` parameter is explicitly mentioned, while others (including `encoding` and `errors``) are passed to `subprocess.Popen`, which falsely suggests that they should work fine. Moreover, the `std*` properties of the subprocess have a `_transport._pipe.encoding` set to the encoding passed to `asyncio.create_subprocess_*`, but apparently it's not used at all. IMHO this is too messy.

Alternatively this option could be implemented, which would require a new kind of StreamReader and StreamWriter.
msg344276 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-06-02 11:14
encoding was explicitly disabled by #36686
Now its usage generates ValueError.

asyncio stdio/stdout/stderr streams are bytes.
If you want to propose a patch that supports text streams -- please create another issue for that.
History
Date User Action Args
2022-04-11 14:58:49adminsetgithub: 75270
2019-06-02 11:14:24asvetlovsetstatus: open -> closed

superseder: Docs: asyncio.loop.subprocess_exec documentation is confusing, it's not clear how to inherit stdin, stdout or stderr in the subprocess

nosy: + asvetlov
messages: + msg344276
resolution: duplicate
stage: resolved
2018-04-15 17:34:02adamplsettype: behavior -> enhancement
versions: + Python 3.8
messages: + msg315336
title: asyncio.create_subprocess_* do not honor `encoding` -> asyncio.create_subprocess_* should honor `encoding`
2018-04-11 23:01:21njssetmessages: + msg315205
2018-04-11 14:31:32adamplsetfiles: + asyncio_encoding_test.py
versions: + Python 3.7
nosy: + adampl

messages: + msg315192

type: behavior
2018-01-18 06:51:19njssetnosy: + njs
messages: + msg310224
2018-01-18 06:28:16Dima.Tisneksetnosy: + Dima.Tisnek
messages: + msg310223
2017-07-31 05:02:20unused-346748setnosy: + yselivanov

components: + asyncio
versions: + Python 3.6
2017-07-31 05:01:57unused-346748create