classification
Title: asyncio.create_subprocess_* should honor `encoding`
Type: enhancement Stage:
Components: asyncio Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Dima.Tisnek, adampl, njs, toriningen, yselivanov
Priority: normal Keywords:

Created on 2017-07-31 05:01 by toriningen, last changed 2018-04-15 17:34 by adampl.

Files
File name Uploaded Description Edit
asyncio_encoding_test.py adampl, 2018-04-11 14:31 another demo
Messages (6)
msg299537 - (view) Author: Alex (toriningen) Date: 2017-07-31 05:01
Regardless of the value of `encoding`, StreamReaders returned for the `asyncio.subprocess.Process`'s `stdout` and `stderr` would be in binary mode.

    import sys
    import asyncio
    import subprocess

    async def main():
        sp = await asyncio.create_subprocess_exec('ls', '-la',
                                                  stdin=None,
                                                  stdout=subprocess.PIPE,
                                                  encoding='utf8')
        await sp.wait()
        data = await sp.stdout.read()
        print(data)
        print(isinstance(data, bytes))

    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

There are two naive solutions:

- `create_subprocess_*` could explicitly raise when provided any `encoding` and `errors` (like it now does with `universal_newlines`).

- or implement encoding conversions for StreamReaders (and StreamWriters), and forward those.

Of course it would be better to have conversions, but I don't know how those will have to deal with decoding partially available data - e.g. returning in the middle of surrogate pair, or having only half a codepoint for UTF16. There should likely cache partial data to process at the next read, but this would require rewriting codecs to support partial decode... seems like it's not that easy :(
msg310223 - (view) Author: Dima Tisnek (Dima.Tisnek) Date: 2018-01-18 06:28
I'd love to see universal_newlines=True in asyncio.subprocess.
msg310224 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2018-01-18 06:51
Python's codecs already support partial decode, exactly to handle this kind of case. See codecs.getincremental{en,de}coder and

https://docs.python.org/3/library/codecs.html#incremental-encoding-and-decoding
msg315192 - (view) Author: Adam (adampl) * Date: 2018-04-11 14:31
It occurs both on Python 3.6 and 3.7 RC, so maybe it should be fixed in the 3.7 release.
msg315205 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2018-04-11 23:01
@adampl: The first step is for someone (possibly you :-)) to write a patch and submit it as a PR against the Python master branch. Then once the actual change is figured out, we can have the discussion about which releases to backport it to.
msg315336 - (view) Author: Adam (adampl) * Date: 2018-04-15 17:34
After reading the docs more carefully, it's now plain to me that text encoding is not supported yet, so actually it's not a bug :)

However the docs should be improved (and then an assertion could be added too) to prevent people from falling into this trap. Only the `universal_newlines` parameter is explicitly mentioned, while others (including `encoding` and `errors``) are passed to `subprocess.Popen`, which falsely suggests that they should work fine. Moreover, the `std*` properties of the subprocess have a `_transport._pipe.encoding` set to the encoding passed to `asyncio.create_subprocess_*`, but apparently it's not used at all. IMHO this is too messy.

Alternatively this option could be implemented, which would require a new kind of StreamReader and StreamWriter.
History
Date User Action Args
2018-04-15 17:34:02adamplsettype: behavior -> enhancement
versions: + Python 3.8
messages: + msg315336
title: asyncio.create_subprocess_* do not honor `encoding` -> asyncio.create_subprocess_* should honor `encoding`
2018-04-11 23:01:21njssetmessages: + msg315205
2018-04-11 14:31:32adamplsetfiles: + asyncio_encoding_test.py
versions: + Python 3.7
nosy: + adampl

messages: + msg315192

type: behavior
2018-01-18 06:51:19njssetnosy: + njs
messages: + msg310224
2018-01-18 06:28:16Dima.Tisneksetnosy: + Dima.Tisnek
messages: + msg310223
2017-07-31 05:02:20toriningensetnosy: + yselivanov

components: + asyncio
versions: + Python 3.6
2017-07-31 05:01:57toriningencreate