classification
Title: asyncio ssl memory leak
Type: resource usage Stage: resolved
Components: asyncio Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, cnpeterson, fantix, miss-islington, thehesiod, yselivanov
Priority: normal Keywords: patch

Created on 2018-09-19 22:41 by thehesiod, last changed 2019-03-25 06:10 by thehesiod. This issue is now closed.

Files
File name Uploaded Description Edit
memory_usage.png cnpeterson, 2018-12-20 22:32 Memory Usage
Pull Requests
URL Status Linked Edit
PR 12386 merged fantix, 2019-03-17 22:18
PR 12387 merged miss-islington, 2019-03-17 22:51
PR 12393 closed fantix, 2019-03-18 04:39
Messages (8)
msg325811 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-09-19 22:41
I've been trying to track down a leak in aiohttp: https://github.com/aio-libs/aiohttp/issues/3010

it seems like this leak now occurs with raw asyncio SSL sockets.

when the gist script is run like so: python3.7 `which mprof` run --interval=1 ~/dev/test_leak.py -test asyncio_test

it slowly leaks memory.  This is effectively doing the following:

URLS = {
    'https://s3.us-west-2.amazonaws.com/archpi.dabase.com/style.css': {
        'method': 'get',
        'headers': {'User-Agent': 'Botocore/1.8.21 Python/3.6.4 Darwin/17.5.0', 'X-Amz-Date': '20180518T025044Z', 'X-Amz-Content-SHA256': 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 'Authorization': f'AWS4-HMAC-SHA256 Credential={CREDENTIAL}/20180518/us-west-2/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=ae552641b9aa9a7a267fcb4e36960cd5863e55d91c9b45fd39b30fdcd2e81489', 'Accept-Encoding': 'identity'}
    },

    'https://s3.ap-southeast-1.amazonaws.com/archpi.dabase.com/doesnotexist': {
        'method': 'GET' if sys.argv[1] == 'get_object' else 'HEAD',
        'headers': {'User-Agent': 'Botocore/1.8.21 Python/3.6.4 Darwin/17.5.0', 'X-Amz-Date': '20180518T025221Z', 'X-Amz-Content-SHA256': 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 'Authorization': f'AWS4-HMAC-SHA256 Credential={CREDENTIAL}/20180518/ap-southeast-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=7a7675ef6d70cb647ed59e02d532ffa80d437fb03976d8246ea9ef102d118794', 'Accept-Encoding': 'identity'}
    }
}


class HttpClient(asyncio.streams.FlowControlMixin):
    transport = None

    def __init__(self, *args, **kwargs):
        self.__url = kwargs.pop('url')
        self.__logger = logging.getLogger()
        super().__init__()

    def connection_made(self, transport):
        self.transport = transport

        url_parts = urlparse(self.__url)
        entry = URLS[self.__url]

        body = f'{entry["method"]} {url_parts.path} HTTP/1.1\r\nAccept: */*\r\nHost: {url_parts.hostname}\r\n'
        for name, value in entry['headers'].items():
            body += f'{name}: {value}\r\n'

        body += '\r\n'
        self.transport.write(body.encode('ascii'))
        self.__logger.info(f'data sent: {body}')

    def data_received(self, data):
        self.__logger.info(f'data received: {data}')

        self.transport.close()
        # asyncio.get_event_loop().call_later(1.0, )

    def eof_received(self):
        self.__logger.info('eof_received')

    def connection_lost(self, exc):
        self.__logger.info(f'connection lost: {exc}')
        super().connection_lost(exc)

    @classmethod
    def create_factory(cls, url: str):
        def factory(*args, **kwargs):
            return cls(*args, url=url, **kwargs)

        return factory


async def test_asyncio(ssl_context):
    loop = asyncio.get_event_loop()

    url = 'https://s3.ap-southeast-1.amazonaws.com/archpi.dabase.com/doesnotexist'
    url_parts = urlparse(url)
    port = url_parts.port or (80 if url_parts.scheme == 'http' else 443)
    infos = await loop.getaddrinfo(url_parts.hostname, port, family=socket.AF_INET)
    family, type, proto, canonname, sockaddr = infos[0]
    await loop.create_connection(HttpClient.create_factory(url), sockaddr[0], port, ssl=ssl_context, family=family, proto=proto, flags=socket.AI_NUMERICHOST, server_hostname=url_parts.hostname, local_addr=None)


async def asyncio_test():
    ssl_context = ssl.create_default_context()

    while True:
        await test_asyncio(ssl_context)


await asyncio_test()
msg325813 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-09-19 22:44
What is "raw asyncio SSL sockets"? We don't use SSL sockets in asyncio, we use SSL Memory BIO.  Do you think that some SSL context objects aren't being properly released?

BTW, can you see the leak when run under uvloop?
msg325817 - (view) Author: Alexander Mohr (thehesiod) * Date: 2018-09-19 23:10
sorry, by "raw" I mean in the context of aiohttp, so just using the normal python ssl context and asyncio sockets.  I don't think it's an object not getting GC'd because I didn't see any increase on object counts, nor leaks per tracemalloc.  I think it's some low-level native memory leaked by openssl.

I've updated the gist w/ uvloop and ran with it and still get a leak, see gist + aiohttp issue for plot
msg325819 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-09-19 23:13
Would you be able to test uvloop/master branch?  Current uvloop 0.11.x uses pretty much the asyncio implementation; the master branch has a completely rewritten SSL layer.  If the master branch has the leak it might mean that the root cause is indeed in either the ssl module or openssl itself.
msg332273 - (view) Author: Chase Peterson (cnpeterson) * Date: 2018-12-20 22:32
I ran the code snippet below using uvloop/master in a docker container. As it ran, the container continually leaked memory. I included a graph with the memory usage.

Environment:
# cat /etc/*-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

# uname -r
4.18.16-300.fc29.x86_64

# python -V
Python 3.7.1

# pip freeze
asyncio==3.4.3
Cython==0.29.2
idna==2.8
multidict==4.5.2
uvloop==0.12.0rc2
yarl==1.3.0

I had to tweak the code a bit to run in a docker container successfully, but here is the code I used:

import asyncio
import logging
import ssl
import socket
import sys
import yarl
import uvloop

CREDENTIAL = ''

URLS = {
    'https://s3.us-west-2.amazonaws.com/archpi.dabase.com/style.css': {
        'method': 'get',
        'headers': {'User-Agent': 'Botocore/1.8.21 Python/3.6.4 Darwin/17.5.0', 'X-Amz-Date': '20180518T025044Z', 'X-Amz-Content-SHA256': 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 'Authorization': f'AWS4-HMAC-SHA256 Credential={CREDENTIAL}/20180518/us-west-2/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=ae552641b9aa9a7a267fcb4e36960cd5863e55d91c9b45fd39b30fdcd2e81489', 'Accept-Encoding': 'identity'}
    },

    'https://s3.ap-southeast-1.amazonaws.com/archpi.dabase.com/doesnotexist': {
        'method': 'GET' if sys.argv[1] == 'get_object' else 'HEAD',
        'headers': {'User-Agent': 'Botocore/1.8.21 Python/3.6.4 Darwin/17.5.0', 'X-Amz-Date': '20180518T025221Z', 'X-Amz-Content-SHA256': 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 'Authorization': f'AWS4-HMAC-SHA256 Credential={CREDENTIAL}/20180518/ap-southeast-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=7a7675ef6d70cb647ed59e02d532ffa80d437fb03976d8246ea9ef102d118794', 'Accept-Encoding': 'identity'}
    }
}

asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

class HttpClient(asyncio.streams.FlowControlMixin):
    transport = None

    def __init__(self, *args, **kwargs):
        self.__url = kwargs.pop('url')
        self.__logger = logging.getLogger()
        super().__init__()

    def connection_made(self, transport):
        self.transport = transport

        url_parts = yarl.URL(self.__url)
        entry = URLS[self.__url]

        body = f'{entry["method"]} {url_parts.path} HTTP/1.1\r\nAccept: */*\r\nHost: {url_parts.host}\r\n'
        for name, value in entry['headers'].items():
            body += f'{name}: {value}\r\n'

        body += '\r\n'
        self.transport.write(body.encode('ascii'))
        self.__logger.info(f'data sent: {body}')

    def data_received(self, data):
        self.__logger.info(f'data received: {data}')

        self.transport.close()
        # asyncio.get_event_loop().call_later(1.0, )

    def eof_received(self):
        self.__logger.info('eof_received')

    def connection_lost(self, exc):
        self.__logger.info(f'connection lost: {exc}')
        super().connection_lost(exc)

    @classmethod
    def create_factory(cls, url: str):
        def factory(*args, **kwargs):
            return cls(*args, url=url, **kwargs)

        return factory


async def test_asyncio(ssl_context):
    loop = asyncio.get_event_loop()

    url = 'https://s3.ap-southeast-1.amazonaws.com/archpi.dabase.com/doesnotexist'
    url_parts = yarl.URL(url)
    port = url_parts.port or (80 if url_parts.scheme == 'http' else 443)
    infos = await loop.getaddrinfo(url_parts.host, port, family=socket.AF_INET)
    family, type, proto, canonname, sockaddr = infos[0]
    await loop.create_connection(HttpClient.create_factory(url), sockaddr[0], port, ssl=ssl_context, family=family, proto=proto, flags=socket.AI_NUMERICHOST, server_hostname=url_parts.host, local_addr=None)


async def asyncio_test():
    ssl_context = ssl.create_default_context()

    while True:
        await test_asyncio(ssl_context)


def main():
    print('running')
    loop = asyncio.get_event_loop()
    loop.run_until_complete(asyncio_test())


main()
msg338145 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2019-03-17 22:51
New changeset f683f464259715d620777d7ed568e701337a703a by Yury Selivanov (Fantix King) in branch 'master':
bpo-34745: Fix asyncio sslproto memory issues (GH-12386)
https://github.com/python/cpython/commit/f683f464259715d620777d7ed568e701337a703a
msg338146 - (view) Author: miss-islington (miss-islington) Date: 2019-03-17 23:09
New changeset 7f7485c0605fe979e39c58b688f2bb6a4f43e65e by Miss Islington (bot) in branch '3.7':
bpo-34745: Fix asyncio sslproto memory issues (GH-12386)
https://github.com/python/cpython/commit/7f7485c0605fe979e39c58b688f2bb6a4f43e65e
msg338784 - (view) Author: Alexander Mohr (thehesiod) * Date: 2019-03-25 06:10
going to close, I've verified that it fixes my original issue, ty!!
History
Date User Action Args
2019-03-25 06:10:45thehesiodsetstatus: open -> closed
resolution: fixed
messages: + msg338784

stage: patch review -> resolved
2019-03-18 04:39:31fantixsetpull_requests: + pull_request12348
2019-03-17 23:09:17miss-islingtonsetnosy: + miss-islington
messages: + msg338146
2019-03-17 22:51:37miss-islingtonsetpull_requests: + pull_request12344
2019-03-17 22:51:15yselivanovsetmessages: + msg338145
2019-03-17 22:18:29fantixsetkeywords: + patch
stage: patch review
pull_requests: + pull_request12343
2018-12-20 22:34:52thehesiodsetversions: + Python 3.6
2018-12-20 22:32:04cnpetersonsetfiles: + memory_usage.png
nosy: + cnpeterson
messages: + msg332273

2018-09-30 03:57:02fantixsetnosy: + fantix
2018-09-19 23:13:32yselivanovsetmessages: + msg325819
2018-09-19 23:10:57thehesiodsetmessages: + msg325817
2018-09-19 22:44:28yselivanovsetmessages: + msg325813
2018-09-19 22:41:00thehesiodcreate