classification
Title: Memory leak while running TCP/UDPServer with socketserver.ThreadingMixIn
Type: resource usage Stage: resolved
Components: Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Wei Li, jaraco, martin.panter, maru-n, miss-islington, ned.deily, pablogsal, vstinner
Priority: normal Keywords: 3.7regression, patch

Created on 2019-06-07 11:53 by maru-n, last changed 2021-03-11 12:01 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
threadingmixin_memory_usage.png maru-n, 2019-06-07 11:53
Pull Requests
URL Status Linked Edit
PR 13893 merged maru-n, 2019-06-07 13:02
PR 23087 closed miss-islington, 2020-11-01 23:51
PR 23088 closed miss-islington, 2020-11-01 23:51
PR 23089 closed miss-islington, 2020-11-01 23:51
PR 23107 merged jaraco, 2020-11-02 13:56
PR 23127 merged jaraco, 2020-11-03 14:04
PR 24032 closed miss-islington, 2021-01-01 00:37
PR 24033 closed miss-islington, 2021-01-01 00:37
PR 24749 merged miss-islington, 2021-03-04 15:49
PR 24750 merged miss-islington, 2021-03-04 15:49
Messages (23)
msg344926 - (view) Author: Norihiro Maruyama (maru-n) * Date: 2019-06-07 11:53
UDP/TCPServer with socketserver.ThreadingMixin class (also ThreadingTCPServer and ThreadingUDPServer class) seems to be memory leak while running the server.

https://docs.python.org/3/library/socketserver.html#socketserver.ThreadingMixIn

My code which wrote to check this is the following.

```
class ThreadedTCPRequestHandler(socketserver.BaseRequestHandler):
    def handle(self):
        data = str(self.request.recv(1024), 'ascii')
        cur_thread = threading.current_thread()
        response = bytes("{}: {}".format(cur_thread.name, data), 'ascii')
        self.request.sendall(response)


if __name__ == "__main__":
    HOST, PORT = "localhost", 0

    server = socketserver.ThreadingTCPServer((HOST, PORT), ThreadedTCPRequestHandler)
    server.daemon_threads = False
    server.block_on_close = True

    with server:
        ip, port = server.server_address

        server_thread = threading.Thread(target=server.serve_forever)

        server_thread.daemon = True
        server_thread.start()

        for i in range(1000):
            with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
                sock.connect(server.server_address)
                sock.sendall(bytes("hello", 'ascii'))
                response = str(sock.recv(1024), 'ascii')
                print("Received: {}".format(response))
            time.sleep(0.01)

        server.server_close()
        server.shutdown()
```
( I wrote this based on https://docs.python.org/3/library/socketserver.html#asynchronous-mixins)

Then I checked memory usage with profiling tool.
(I used memory-profiler module https://pypi.org/project/memory-profiler/)

$ mprof run python mycode.py
$ mprof plot

I attached result plot.

And also I checked this also more long time and I found memory usage was increased endlessly.

My environment is

Hardware: MacBook Pro (15-inch, 2018)
OS: MacOS 10.14
Python 3.7.1


I guess it caused by a thread object is not released in spite of the thread finished to process request and thread object will be made infinitely until server_close() is called.
msg345606 - (view) Author: Wei Li (Wei Li) Date: 2019-06-14 16:07
I got the same problem when uing the ThreadingTCPServer.

I think adding 
"self._threads = list(filter(lambda x: x.is_alive(), self._threads))"
at the last line in process_request method is a potential way to fix the bug
msg345796 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2019-06-17 01:47
Looking at the code, this would be caused by Issue 31233. I expect 3.7+ is affected. 3.6 has similar code, but the leaking looks to be disabled by default. 2.7 doesn't collect a "_threads" list at all.

Looks like Victor was aware of the leak when he changed the code: <https://bugs.python.org/issue31233#msg304619>, but maybe he pushed the code and then forgot about the problem.

A possible problem with Norihiro's solution is modifying the "_threads" list from multiple threads without any synchronization. (Not sure if that is a problem, or is it guaranteed to be okay due to GIL etc?) Also, since the thread is removing itself from the list, it will still run a short while after the removal, so there is a window when the "server_close" method will not wait for that thread. Might also defeat the "dangling thread" accounting that I believe was Victor's motivation for his change.

Wei's proposal is to check for cleaning up when a new request is handled. That relies on a new request coming in to free up memory. Perhaps we could use similar strategy to the Forking mixin, which I believe cleans up expired children periodically, without relying on a new request.
msg345815 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-17 08:16
PR 13893 with an additional lock sounds like a reasonable solution. The code should be skipped if the thread is a daemon thread.
msg345817 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-17 08:20
Martin Panter: In addition to PR 13893 change, what do you think of also using a weakref? It might make the code even more reliable if something goes wrong.
msg346413 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-24 15:01
I marked bpo-37389 as a duplicate of this issue:

"""
msg346410 - (view) 	Author: Daniel W Forsyth (danf@dataforge.on.ca) 	Date: 2019-06-24 14:53

After putting a basic ThreadingUDPServer under load (500 messages per/second)  I noticed that after a night it was consuming a lot of RAM given it does nothing with the data.

On inception, I noticed the _thread count inside the server was growing forever even though the sub-threads are done.

Setup a basic ThreadingUDPSever with handler that does nothing and check the request_queue_size, it seems to grow without limit.

msg346411 - (view) 	Author: Daniel W Forsyth (danf@dataforge.on.ca) 	Date: 2019-06-24 14:59

The only way I could figure out to control it was to do this in a thread;

        for thread in server._threads:  # type: Thread
            if not thread.is_alive():
                server._threads.remove(thread)

Shouldn't the server process do this when the thread is done?
"""
msg358133 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2019-12-09 19:26
This issue was also [reported in prometheus client](https://salsa.debian.org/debian/python-prometheus-client/commit/5aa256d8aab3b81604b855dc03f260342fc391fb) where the workaround was to use daemon threads.
msg358457 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2019-12-15 22:37
Another workaround might be to set the new "block_on_close" flag (Issue 33540) to False on the server subclass or instance.

Victor: Replying to <https://bugs.python.org/issue37193#msg345817> "What do I think of also using a weakref?", I assume you mean maintaining "_threads" as a WeakSet rather than a list object. That seems a nice way to solve the problem, but it seems redundant to me if other code such as Maru's proposal was also added to clean up the list.
msg358647 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2019-12-18 22:08
FTR I have been trialling a patched Python 3.7 based on Maru's changes (revision 6ac217c) + review suggestions, and it has reduced the size of the leak (hit 1 GB over a couple days, vs only 60 MB increase over three days). The remaining leak could be explained by Issue 37788.
msg371339 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-06-12 08:31
I note this is marked as a 3.7regression and still open. Since the cutoff for the final 3.7 bugfix mode release is in a few days, I'm assuming this means that 3.7 users will have to live with this regression.  If you feel that is a problem, speak up now.
msg371431 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-06-13 01:36
Thanks for the notice Ned. I've revived the PR and addressed all the comments from Victor. Any chance this can get into Python 3.7?
msg371433 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-06-13 03:53
> Any chance this can get into Python 3.7?

Perhaps but there's a lot that needs to be done yet. Like any bugfix, it needs to be reviewed, merged to master, and get some buildbot exposure first before it is backported anywhere.
msg380168 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-11-01 23:51
New changeset c41559021213cfc9dc62a83fc63306b3bdc3e64b by MARUYAMA Norihiro in branch 'master':
bpo-37193: remove thread objects which finished process its request (GH-13893)
https://github.com/python/cpython/commit/c41559021213cfc9dc62a83fc63306b3bdc3e64b
msg380172 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-11-02 01:43
Commit c41559021213cfc9dc62a83fc63306b3bdc3e64b has introduced reference leaks:

----------------------------------------------------------------------
Ran 202 tests in 21.654s
OK (skipped=1)
......
test_logging leaked [20, 20, 20] references, sum=60
test_logging leaked [20, 20, 20] memory blocks, sum=60
2 tests failed again:
    test_logging test_socketserver
== Tests result: FAILURE then FAILURE ==

Example buildbot failure:

https://buildbot.python.org/all/#/builders/562/builds/79/steps/5/logs/stdio

As there is a release of 3.10 alpha 2 tomorrow I would be great if this could be fixed by tomorrow.
msg380209 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-02 11:43
The change fixing a leak in socketserver introduces a leak in socketserver :-)

$ ./python -m test test_socketserver -u all -m test.test_socketserver.SocketServerTest.test_ThreadingTCPServer -R 3:3
0:00:00 load avg: 0.95 Run tests sequentially
0:00:00 load avg: 0.95 [1/1] test_socketserver
beginning 6 repetitions
123456
......
test_socketserver leaked [3, 3, 3] references, sum=9
test_socketserver leaked [3, 3, 3] memory blocks, sum=9
test_socketserver failed

== Tests result: FAILURE ==

1 test failed:
    test_socketserver

Total duration: 497 ms
Tests result: FAILURE
msg380212 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-02 11:49
I rejected the backport to 3.8 and 3.9 since the change causes a regression on master.
msg380217 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-11-02 13:10
I recommend a rollback. I’ll try to get to it later today.
msg380237 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-11-02 16:49
New changeset aca67da4fe68d5420401ac1782203d302875eb27 by Jason R. Coombs in branch 'master':
Revert "bpo-37193: remove thread objects which finished process its request (GH-13893)" (GH-23107)
https://github.com/python/cpython/commit/aca67da4fe68d5420401ac1782203d302875eb27
msg380354 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-11-04 19:43
I filed issue42263 to capture the underlying cause of the memory leak that led to the buildbot failures and the rollback.
msg384136 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-12-31 20:19
New changeset b5711c940f70af89f2b4cf081a3fcd83924f3ae7 by Jason R. Coombs in branch 'master':
bpo-37193: Remove thread objects which finished process its request (GH-23127)
https://github.com/python/cpython/commit/b5711c940f70af89f2b4cf081a3fcd83924f3ae7
msg388101 - (view) Author: miss-islington (miss-islington) Date: 2021-03-04 16:36
New changeset 0e76157b0ca70bd38157fed56680a7752a52668d by Miss Islington (bot) in branch '3.9':
[3.9] bpo-37193: Remove thread objects which finished process its request (GH-23127) (GH-24750)
https://github.com/python/cpython/commit/0e76157b0ca70bd38157fed56680a7752a52668d
msg388104 - (view) Author: miss-islington (miss-islington) Date: 2021-03-04 16:55
New changeset 0064d561b8e44f7a991188d7c5016c165bc89326 by Miss Islington (bot) in branch '3.8':
[3.8] bpo-37193: Remove thread objects which finished process its request (GH-23127) (GH-24749)
https://github.com/python/cpython/commit/0064d561b8e44f7a991188d7c5016c165bc89326
msg388503 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-11 12:01
Thank you for fixing the regression Jason R. Coombs ;-)
History
Date User Action Args
2021-03-11 12:01:19vstinnersetmessages: + msg388503
2021-03-04 16:55:43miss-islingtonsetmessages: + msg388104
2021-03-04 16:36:50miss-islingtonsetmessages: + msg388101
2021-03-04 15:49:32miss-islingtonsetpull_requests: + pull_request23520
2021-03-04 15:49:16miss-islingtonsetpull_requests: + pull_request23519
2021-01-01 00:37:39miss-islingtonsetpull_requests: + pull_request22874
2021-01-01 00:37:29miss-islingtonsetpull_requests: + pull_request22873
2020-12-31 20:26:17pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-12-31 20:19:39pablogsalsetmessages: + msg384136
2020-11-04 19:43:56jaracosetmessages: + msg380354
2020-11-03 14:04:47jaracosetpull_requests: + pull_request22043
2020-11-02 16:49:31pablogsalsetmessages: + msg380237
2020-11-02 13:56:01jaracosetpull_requests: + pull_request22024
2020-11-02 13:10:52jaracosetmessages: + msg380217
2020-11-02 11:49:15vstinnersetmessages: + msg380212
2020-11-02 11:43:41vstinnersetmessages: + msg380209
2020-11-02 01:43:32pablogsalsetnosy: + pablogsal
messages: + msg380172
2020-11-01 23:51:34miss-islingtonsetpull_requests: + pull_request22006
2020-11-01 23:51:27miss-islingtonsetpull_requests: + pull_request22005
2020-11-01 23:51:19miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request22004
2020-11-01 23:51:12jaracosetmessages: + msg380168
2020-06-13 03:53:46ned.deilysetmessages: + msg371433
2020-06-13 01:36:55jaracosetmessages: + msg371431
2020-06-12 22:37:53gregory.p.smithsetversions: + Python 3.8, Python 3.9, Python 3.10
2020-06-12 08:31:26ned.deilysetnosy: + ned.deily
messages: + msg371339
2019-12-18 22:08:26martin.pantersetmessages: + msg358647
2019-12-15 22:37:42martin.pantersetmessages: + msg358457
2019-12-09 19:26:01jaracosetmessages: + msg358133
2019-12-09 19:17:40jaracosetnosy: + jaraco
2019-06-24 15:01:41vstinnersetmessages: + msg346413
2019-06-24 15:01:07vstinnerlinkissue37389 superseder
2019-06-17 08:20:03vstinnersetmessages: + msg345817
2019-06-17 08:16:18vstinnersetmessages: + msg345815
2019-06-17 01:47:45martin.pantersetkeywords: + 3.7regression
nosy: + martin.panter, vstinner
messages: + msg345796

2019-06-14 16:07:44Wei Lisetnosy: + Wei Li
messages: + msg345606
2019-06-07 13:02:57maru-nsetkeywords: + patch
stage: patch review
pull_requests: + pull_request13767
2019-06-07 11:53:39maru-ncreate