Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak while running TCP/UDPServer with socketserver.ThreadingMixIn #81374

Closed
maru-n mannequin opened this issue Jun 7, 2019 · 23 comments
Closed

Memory leak while running TCP/UDPServer with socketserver.ThreadingMixIn #81374

maru-n mannequin opened this issue Jun 7, 2019 · 23 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@maru-n
Copy link
Mannequin

maru-n mannequin commented Jun 7, 2019

BPO 37193
Nosy @jaraco, @vstinner, @ned-deily, @vadmium, @pablogsal, @miss-islington, @maru-n
PRs
  • bpo-37193: remove thread objects which finished process its request #13893
  • [3.9] bpo-37193: remove thread objects which finished process its request (GH-13893) #23087
  • [3.8] bpo-37193: remove thread objects which finished process its request (GH-13893) #23088
  • [3.7] bpo-37193: remove thread objects which finished process its request (GH-13893) #23089
  • bpo-37193: Revert "remove thread objects which finished process its request" #23107
  • bpo-37193: remove thread objects which finished process its request #23127
  • [3.8] bpo-37193: Remove thread objects which finished process its request (GH-23127) #24032
  • [3.9] bpo-37193: Remove thread objects which finished process its request (GH-23127) #24033
  • [3.8] bpo-37193: Remove thread objects which finished process its request (GH-23127) #24749
  • [3.9] bpo-37193: Remove thread objects which finished process its request (GH-23127) #24750
  • Files
  • threadingmixin_memory_usage.png
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-12-31.20:26:17.417>
    created_at = <Date 2019-06-07.11:53:39.380>
    labels = ['3.8', '3.9', '3.10', 'performance', '3.7', 'library']
    title = 'Memory leak while running TCP/UDPServer with socketserver.ThreadingMixIn'
    updated_at = <Date 2021-03-11.12:01:19.341>
    user = 'https://github.com/maru-n'

    bugs.python.org fields:

    activity = <Date 2021-03-11.12:01:19.341>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-12-31.20:26:17.417>
    closer = 'pablogsal'
    components = ['Library (Lib)']
    creation = <Date 2019-06-07.11:53:39.380>
    creator = 'maru-n'
    dependencies = []
    files = ['48399']
    hgrepos = []
    issue_num = 37193
    keywords = ['patch', '3.7regression']
    message_count = 23.0
    messages = ['344926', '345606', '345796', '345815', '345817', '346413', '358133', '358457', '358647', '371339', '371431', '371433', '380168', '380172', '380209', '380212', '380217', '380237', '380354', '384136', '388101', '388104', '388503']
    nosy_count = 8.0
    nosy_names = ['jaraco', 'vstinner', 'ned.deily', 'martin.panter', 'pablogsal', 'miss-islington', 'maru-n', 'Wei Li']
    pr_nums = ['13893', '23087', '23088', '23089', '23107', '23127', '24032', '24033', '24749', '24750']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue37193'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10']

    @maru-n
    Copy link
    Mannequin Author

    maru-n mannequin commented Jun 7, 2019

    UDP/TCPServer with socketserver.ThreadingMixin class (also ThreadingTCPServer and ThreadingUDPServer class) seems to be memory leak while running the server.

    https://docs.python.org/3/library/socketserver.html#socketserver.ThreadingMixIn

    My code which wrote to check this is the following.

    class ThreadedTCPRequestHandler(socketserver.BaseRequestHandler):
        def handle(self):
            data = str(self.request.recv(1024), 'ascii')
            cur_thread = threading.current_thread()
            response = bytes("{}: {}".format(cur_thread.name, data), 'ascii')
            self.request.sendall(response)
    
    
    if __name__ == "__main__":
        HOST, PORT = "localhost", 0
    
        server = socketserver.ThreadingTCPServer((HOST, PORT), ThreadedTCPRequestHandler)
        server.daemon_threads = False
        server.block_on_close = True
    
        with server:
            ip, port = server.server_address
    
            server_thread = threading.Thread(target=server.serve_forever)
    
            server_thread.daemon = True
            server_thread.start()
    
            for i in range(1000):
                with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
                    sock.connect(server.server_address)
                    sock.sendall(bytes("hello", 'ascii'))
                    response = str(sock.recv(1024), 'ascii')
                    print("Received: {}".format(response))
                time.sleep(0.01)
    
            server.server_close()
            server.shutdown()
    

    ( I wrote this based on https://docs.python.org/3/library/socketserver.html#asynchronous-mixins)

    Then I checked memory usage with profiling tool.
    (I used memory-profiler module https://pypi.org/project/memory-profiler/)

    $ mprof run python mycode.py
    $ mprof plot

    I attached result plot.

    And also I checked this also more long time and I found memory usage was increased endlessly.

    My environment is

    Hardware: MacBook Pro (15-inch, 2018)
    OS: MacOS 10.14
    Python 3.7.1

    I guess it caused by a thread object is not released in spite of the thread finished to process request and thread object will be made infinitely until server_close() is called.

    @maru-n maru-n mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir performance Performance or resource usage labels Jun 7, 2019
    @WeiLi
    Copy link
    Mannequin

    WeiLi mannequin commented Jun 14, 2019

    I got the same problem when uing the ThreadingTCPServer.

    I think adding
    "self._threads = list(filter(lambda x: x.is_alive(), self._threads))"
    at the last line in process_request method is a potential way to fix the bug

    @vadmium
    Copy link
    Member

    vadmium commented Jun 17, 2019

    Looking at the code, this would be caused by bpo-31233. I expect 3.7+ is affected. 3.6 has similar code, but the leaking looks to be disabled by default. 2.7 doesn't collect a "_threads" list at all.

    Looks like Victor was aware of the leak when he changed the code: <https://bugs.python.org/issue31233#msg304619\>, but maybe he pushed the code and then forgot about the problem.

    A possible problem with Norihiro's solution is modifying the "_threads" list from multiple threads without any synchronization. (Not sure if that is a problem, or is it guaranteed to be okay due to GIL etc?) Also, since the thread is removing itself from the list, it will still run a short while after the removal, so there is a window when the "server_close" method will not wait for that thread. Might also defeat the "dangling thread" accounting that I believe was Victor's motivation for his change.

    Wei's proposal is to check for cleaning up when a new request is handled. That relies on a new request coming in to free up memory. Perhaps we could use similar strategy to the Forking mixin, which I believe cleans up expired children periodically, without relying on a new request.

    @vstinner
    Copy link
    Member

    PR 13893 with an additional lock sounds like a reasonable solution. The code should be skipped if the thread is a daemon thread.

    @vstinner
    Copy link
    Member

    Martin Panter: In addition to PR 13893 change, what do you think of also using a weakref? It might make the code even more reliable if something goes wrong.

    @vstinner
    Copy link
    Member

    I marked bpo-37389 as a duplicate of this issue:

    """
    msg346410 - (view) Author: Daniel W Forsyth (danf@dataforge.on.ca) Date: 2019-06-24 14:53

    After putting a basic ThreadingUDPServer under load (500 messages per/second) I noticed that after a night it was consuming a lot of RAM given it does nothing with the data.

    On inception, I noticed the _thread count inside the server was growing forever even though the sub-threads are done.

    Setup a basic ThreadingUDPSever with handler that does nothing and check the request_queue_size, it seems to grow without limit.

    msg346411 - (view) Author: Daniel W Forsyth (danf@dataforge.on.ca) Date: 2019-06-24 14:59

    The only way I could figure out to control it was to do this in a thread;

            for thread in server._threads:  # type: Thread
                if not thread.is_alive():
                    server._threads.remove(thread)

    Shouldn't the server process do this when the thread is done?
    """

    @jaraco
    Copy link
    Member

    jaraco commented Dec 9, 2019

    This issue was also reported in prometheus client where the workaround was to use daemon threads.

    @vadmium
    Copy link
    Member

    vadmium commented Dec 15, 2019

    Another workaround might be to set the new "block_on_close" flag (bpo-33540) to False on the server subclass or instance.

    Victor: Replying to <https://bugs.python.org/issue37193#msg345817\> "What do I think of also using a weakref?", I assume you mean maintaining "_threads" as a WeakSet rather than a list object. That seems a nice way to solve the problem, but it seems redundant to me if other code such as Maru's proposal was also added to clean up the list.

    @vadmium
    Copy link
    Member

    vadmium commented Dec 18, 2019

    FTR I have been trialling a patched Python 3.7 based on Maru's changes (revision 6ac217c) + review suggestions, and it has reduced the size of the leak (hit 1 GB over a couple days, vs only 60 MB increase over three days). The remaining leak could be explained by bpo-37788.

    @ned-deily
    Copy link
    Member

    I note this is marked as a 3.7regression and still open. Since the cutoff for the final 3.7 bugfix mode release is in a few days, I'm assuming this means that 3.7 users will have to live with this regression. If you feel that is a problem, speak up now.

    @gpshead gpshead added 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes labels Jun 12, 2020
    @jaraco
    Copy link
    Member

    jaraco commented Jun 13, 2020

    Thanks for the notice Ned. I've revived the PR and addressed all the comments from Victor. Any chance this can get into Python 3.7?

    @ned-deily
    Copy link
    Member

    Any chance this can get into Python 3.7?

    Perhaps but there's a lot that needs to be done yet. Like any bugfix, it needs to be reviewed, merged to master, and get some buildbot exposure first before it is backported anywhere.

    @jaraco
    Copy link
    Member

    jaraco commented Nov 1, 2020

    New changeset c415590 by MARUYAMA Norihiro in branch 'master':
    bpo-37193: remove thread objects which finished process its request (GH-13893)
    c415590

    @pablogsal
    Copy link
    Member

    Commit c415590 has introduced reference leaks:

    ----------------------------------------------------------------------
    Ran 202 tests in 21.654s
    OK (skipped=1)
    ......
    test_logging leaked [20, 20, 20] references, sum=60
    test_logging leaked [20, 20, 20] memory blocks, sum=60
    2 tests failed again:
    test_logging test_socketserver
    == Tests result: FAILURE then FAILURE ==

    Example buildbot failure:

    https://buildbot.python.org/all/#/builders/562/builds/79/steps/5/logs/stdio

    As there is a release of 3.10 alpha 2 tomorrow I would be great if this could be fixed by tomorrow.

    @vstinner
    Copy link
    Member

    vstinner commented Nov 2, 2020

    The change fixing a leak in socketserver introduces a leak in socketserver :-)

    $ ./python -m test test_socketserver -u all -m test.test_socketserver.SocketServerTest.test_ThreadingTCPServer -R 3:3
    0:00:00 load avg: 0.95 Run tests sequentially
    0:00:00 load avg: 0.95 [1/1] test_socketserver
    beginning 6 repetitions
    123456
    ......
    test_socketserver leaked [3, 3, 3] references, sum=9
    test_socketserver leaked [3, 3, 3] memory blocks, sum=9
    test_socketserver failed

    == Tests result: FAILURE ==

    1 test failed:
    test_socketserver

    Total duration: 497 ms
    Tests result: FAILURE

    @vstinner
    Copy link
    Member

    vstinner commented Nov 2, 2020

    I rejected the backport to 3.8 and 3.9 since the change causes a regression on master.

    @jaraco
    Copy link
    Member

    jaraco commented Nov 2, 2020

    I recommend a rollback. I’ll try to get to it later today.

    @pablogsal
    Copy link
    Member

    New changeset aca67da by Jason R. Coombs in branch 'master':
    Revert "bpo-37193: remove thread objects which finished process its request (GH-13893)" (GH-23107)
    aca67da

    @jaraco
    Copy link
    Member

    jaraco commented Nov 4, 2020

    I filed bpo-42263 to capture the underlying cause of the memory leak that led to the buildbot failures and the rollback.

    @pablogsal
    Copy link
    Member

    New changeset b5711c9 by Jason R. Coombs in branch 'master':
    bpo-37193: Remove thread objects which finished process its request (GH-23127)
    b5711c9

    @miss-islington
    Copy link
    Contributor

    New changeset 0e76157 by Miss Islington (bot) in branch '3.9':
    [3.9] bpo-37193: Remove thread objects which finished process its request (GH-23127) (GH-24750)
    0e76157

    @miss-islington
    Copy link
    Contributor

    New changeset 0064d56 by Miss Islington (bot) in branch '3.8':
    [3.8] bpo-37193: Remove thread objects which finished process its request (GH-23127) (GH-24749)
    0064d56

    @vstinner
    Copy link
    Member

    Thank you for fixing the regression Jason R. Coombs ;-)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants