This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Implement zero copy writes in SelectorSocketTransport in asyncio
Type: resource usage Stage: patch review
Components: asyncio Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, gvanrossum, jakirkham, kumaraditya, yselivanov
Priority: normal Keywords: patch

Created on 2022-03-14 09:18 by kumaraditya, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 31871 open kumaraditya, 2022-03-14 13:40
Messages (2)
msg415124 - (view) Author: Kumar Aditya (kumaraditya) * (Python triager) Date: 2022-03-14 09:18
Currently, _SelectorSocketTransport transport creates a copy of the data before sending which in case of large amount of data, can create multiple giga bytes copies of data before sending.

Script demonstrating current behavior:

-------------------------------------------------------------------------

import asyncio
import memory_profiler

@memory_profiler.profile
async def handle_echo(reader: asyncio.StreamReader, writer: asyncio.StreamWriter):
    data = b'x' * 1024 * 1024 * 1000 # 1000 MiB payload
    writer.write(data)
    await writer.drain()
    writer.close()

async def main():
    server = await asyncio.start_server(
        handle_echo, '127.0.0.1', 8888)

    addrs = ', '.join(str(sock.getsockname()) for sock in server.sockets)
    print(f'Serving on {addrs}')

    async with server:
        asyncio.create_task(server.start_serving())
        reader, writer = await asyncio.open_connection('127.0.0.1', 8888)
        while True:
            data = await reader.read(1024 * 1024 * 100)
            if not data:
                break

asyncio.run(main())
-------------------------------------------------------------------------

Memory profile result:
------------------------------------------------------------------------
Filename: test.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     4     17.7 MiB     17.7 MiB           1   @memory_profiler.profile
     5                                         async def handle_echo(reader: asyncio.StreamReader, writer: asyncio.StreamWriter):
     6   1017.8 MiB   1000.1 MiB           1       data = b'x' * 1024 * 1024 * 1000 # 1000 MiB payload
     7   2015.3 MiB    997.5 MiB           1       writer.write(data)
     8   2015.3 MiB   -988.1 MiB           2       await writer.drain()
     9   1027.1 MiB   -988.1 MiB           1       writer.close()

------------------------------------------------------------------------

To make it zero copy, python's buffer protocol can be used and use memory views of data to save RAM. The writelines method currently joins all the data before sending whereas it can use `socket.sendmsg` to make it more memory efficient.


Links:

- writelines - https://github.com/python/cpython/blob/2153daf0a02a598ed5df93f2f224c1ab2a2cca0d/Lib/asyncio/transports.py#L116

- socket.sendmsg - https://docs.python.org/3/library/socket.html#socket.socket.sendmsg

- memory_profiler -
https://pypi.org/project/memory-profiler/
msg415131 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2022-03-14 11:57
Known problem, PR is welcome!
I expect the fix is not trivial.
History
Date User Action Args
2022-04-11 14:59:57adminsetgithub: 91166
2022-03-21 23:44:35jakirkhamsetnosy: + jakirkham
2022-03-17 13:11:02asvetlovlinkissue40007 superseder
2022-03-14 13:40:36kumaradityasetkeywords: + patch
stage: patch review
pull_requests: + pull_request29970
2022-03-14 11:57:47asvetlovsetmessages: + msg415131
2022-03-14 09:18:41kumaradityacreate