Title: does a GIL round-trip for every 16KB TLS record
Type: performance Stage: patch review
Components: SSL Versions: Python 3.11
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Dima.Tisnek, Matthew Rocklin, Safihre, asvetlov, barry, benjamin.peterson, christian.heimes, jakirkham, josnyder, njs, pablogsal, paul.moore, r.david.murray, steve.dower, tim.golden, vstinner, yselivanov, zach.ware
Priority: normal Keywords: patch

Created on 2019-06-20 18:42 by josnyder, last changed 2021-07-01 16:20 by jakirkham.

Pull Requests
URL Status Linked Edit
PR 25478 open josnyder, 2021-04-20 04:31
Messages (8)
msg346156 - (view) Author: Josh Snyder (josnyder) * Date: 2019-06-20 18:42
Background: drops the GIL and performs exactly one successful call to  OpenSSL's `SSL_read`, whose documentation states "At most the contents of one record will be returned". TLS records are at most 16KB, so high throughput (especially multithreaded) TLS reception can become bottlenecked on the GIL.


For non-blocking sockets, call SSL_read in a loop until the user-supplied limit is reached or no bytes are available on the socket. I don't know of a way to safely improve performance for blocking sockets.

Initial testing:

I performed initial testing using 32 threads pinned to 16 cores, downloading and re-assembling a single 140270MB file from a "real world" TLS sender. This resulted in a 4x increase in throughput, a 6.6x reduction in voluntary context switches, a 3.5x reduction in system time. User time did increase by 43%, so the overall reduction in CPU usage was only 2.67x.

                                 before     after
     wall clock time (s)    :    29.637     7.116
     user time (s)          :     8.793    12.584
     system time (s)        :   105.118    30.010
     user + system time (s) :   113.911    42.594
     cpu utilization (%)    :       384       599 
     voluntary switches     : 1,653,065   248,484
     speed (MB/s)           :      4733     19712

My git branch (currently a draft) is at
msg379904 - (view) Author: Dima Tisnek (Dima.Tisnek) * Date: 2020-10-30 01:37
Amazing! Would you consider making a pull request out of your branch?

re: `PyErr_HasSignals` is a notable addition to the cpython API, it may need discussion, and (imo) documentation.
msg391401 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-04-19 20:06
Josh, could you please rebase your branch and create a pull request? The PR process will verify that you have submitted a CLA.
msg392584 - (view) Author: Safihre (Safihre) Date: 2021-05-01 11:16
It would be very beneficial if this gets added. In our application (usenet client) I have wondered for years why we had to limit ourselves to 16k blocks and have such lower speeds compared to non SSL connections.
msg392585 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-05-01 11:21
The PR is still experimental and only applies to one direction. I doubt that it will be ready, reviewed, and approved for feature freeze of 3.10.
msg392882 - (view) Author: Safihre (Safihre) Date: 2021-05-04 11:54
Understandable, as the feature freeze was yesterday :)

Just like to note that we have a 100.000 or so users (which I know is very little compared to overal number of Python users) of our application that this could really help.
We have used a lot of CPU cycles over the past year doing many repeated 'recv(16384)' for SSL-sockets.

Thank you!
msg392883 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-05-04 11:59
You could try to convince Pablo. He is the release manager for 3.10. He can grant exceptions.

I also included Benjamin, Nathaniel, and Victor in the nosy list. They have a deeper understanding of the network I/O layer than me. I mostly take care of the crypto and X.509 part of the module.
msg396780 - (view) Author: Safihre (Safihre) Date: 2021-06-30 15:51
Is anyone available to give feedback on the remaining questions/problems in the PR?

I don't want to be pushy and if it's only changed in 3.11, I understand, but just hoping for some progress :)
Also willing to dive into it myself, but not a network/python-core specialist to really get all the details right away.
Date User Action Args
2021-07-01 16:20:03jakirkhamsetnosy: + jakirkham
2021-07-01 10:11:21christian.heimessetassignee: christian.heimes ->
components: - Installation, Interpreter Core, Library (Lib), Windows, XML, ctypes, email
versions: + Python 3.11, - Python 3.10
2021-07-01 08:40:11pitrousetnosy: + Matthew Rocklin
2021-07-01 04:31:09kevinpowell9601setassignee: christian.heimes

components: + Installation, Interpreter Core, Library (Lib), Windows, XML, ctypes, email
nosy: + paul.moore, tim.golden, r.david.murray, barry, zach.ware, steve.dower
2021-06-30 15:51:29Safihresetmessages: + msg396780
2021-05-04 11:59:45christian.heimessetnosy: + vstinner, benjamin.peterson, njs, pablogsal
messages: + msg392883
2021-05-04 11:54:59Safihresetmessages: + msg392882
2021-05-01 11:21:54christian.heimessetmessages: + msg392585
2021-05-01 11:16:09Safihresetnosy: + Safihre
messages: + msg392584
2021-04-20 04:31:39josnydersetkeywords: + patch
stage: patch review
pull_requests: + pull_request24203
2021-04-19 20:06:54christian.heimessetmessages: + msg391401
2020-10-30 01:37:22Dima.Tisneksetnosy: + Dima.Tisnek
messages: + msg379904
2020-10-21 16:48:10christian.heimessetassignee: christian.heimes -> (no value)
versions: + Python 3.10, - Python 3.9
2019-07-03 21:47:35pitrousetnosy: + asvetlov, yselivanov

versions: + Python 3.9
2019-06-20 18:42:49josnydercreate