This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: SSLSocket.read does a GIL round-trip for every 16KB TLS record
Type: performance Stage: patch review
Components: SSL Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Dima.Tisnek, Matthew Rocklin, Safihre, barry, benjamin.peterson, christian.heimes, jakirkham, josnyder, njs, pablogsal, paul.moore, r.david.murray, steve.dower, tim.golden, yselivanov, zach.ware
Priority: normal Keywords: patch

Created on 2019-06-20 18:42 by josnyder, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 25478 open josnyder, 2021-04-20 04:31
PR 31492 open Safihre, 2022-02-22 07:31
Messages (12)
msg346156 - (view) Author: Josh Snyder (josnyder) * Date: 2019-06-20 18:42
Background:

SSLSocket.read drops the GIL and performs exactly one successful call to  OpenSSL's `SSL_read`, whose documentation states "At most the contents of one record will be returned". TLS records are at most 16KB, so high throughput (especially multithreaded) TLS reception can become bottlenecked on the GIL.

Proposal:

For non-blocking sockets, call SSL_read in a loop until the user-supplied limit is reached or no bytes are available on the socket. I don't know of a way to safely improve performance for blocking sockets.

Initial testing:

I performed initial testing using 32 threads pinned to 16 cores, downloading and re-assembling a single 140270MB file from a "real world" TLS sender. This resulted in a 4x increase in throughput, a 6.6x reduction in voluntary context switches, a 3.5x reduction in system time. User time did increase by 43%, so the overall reduction in CPU usage was only 2.67x.

                                 before     after
     wall clock time (s)    :    29.637     7.116
     user time (s)          :     8.793    12.584
     system time (s)        :   105.118    30.010
     user + system time (s) :   113.911    42.594
     cpu utilization (%)    :       384       599 
     voluntary switches     : 1,653,065   248,484
     speed (MB/s)           :      4733     19712

My git branch (currently a draft) is at https://github.com/hashbrowncipher/cpython/commits/faster_tls
msg379904 - (view) Author: Dima Tisnek (Dima.Tisnek) * Date: 2020-10-30 01:37
Amazing! Would you consider making a pull request out of your branch?

re: `PyErr_HasSignals` is a notable addition to the cpython API, it may need discussion, and (imo) documentation.
msg391401 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-04-19 20:06
Josh, could you please rebase your branch and create a pull request? The PR process will verify that you have submitted a CLA.
msg392584 - (view) Author: Safihre (Safihre) * Date: 2021-05-01 11:16
It would be very beneficial if this gets added. In our application (usenet client) I have wondered for years why we had to limit ourselves to 16k blocks and have such lower speeds compared to non SSL connections.
msg392585 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-05-01 11:21
The PR is still experimental and only applies to one direction. I doubt that it will be ready, reviewed, and approved for feature freeze of 3.10.
msg392882 - (view) Author: Safihre (Safihre) * Date: 2021-05-04 11:54
Understandable, as the feature freeze was yesterday :)

Just like to note that we have a 100.000 or so users (which I know is very little compared to overal number of Python users) of our application that this could really help.
 
We have used a lot of CPU cycles over the past year doing many repeated 'recv(16384)' for SSL-sockets.

Thank you!
msg392883 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-05-04 11:59
You could try to convince Pablo. He is the release manager for 3.10. He can grant exceptions.

I also included Benjamin, Nathaniel, and Victor in the nosy list. They have a deeper understanding of the network I/O layer than me. I mostly take care of the crypto and X.509 part of the module.
msg396780 - (view) Author: Safihre (Safihre) * Date: 2021-06-30 15:51
Is anyone available to give feedback on the remaining questions/problems in the PR?

I don't want to be pushy and if it's only changed in 3.11, I understand, but just hoping for some progress :)
Also willing to dive into it myself, but not a network/python-core specialist to really get all the details right away.
msg413900 - (view) Author: Safihre (Safihre) * Date: 2022-02-24 10:00
Could the new PR be reviewed? Thank you!
https://github.com/python/cpython/pull/31492
Documentation still needs updating, but would like feedback.

PS: Why not enable the setting the GitHub Actions workflow only need to be approved for new GitHub accounts instead of for *all* new contributors?
msg415035 - (view) Author: Safihre (Safihre) * Date: 2022-03-13 12:12
Pinging in hope for a review on https://github.com/python/cpython/pull/31492
msg415094 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-03-13 20:22
We have very few people that are familar with ssl module and especially with its I/O layer. I'm busy with other topics. Others are directly affected by war in Ukraine.

I'm not a particular fan of the new "eager_recv" property introduced by your PR. Also the improvement should be implemented for read and write ops.
msg415095 - (view) Author: Safihre (Safihre) * Date: 2022-03-13 20:35
Implementing for write is not needed as OpenSSL's SSL_write_ex that is used by write() already writes the whole buffer at once. Only reading OpenSSL does in the 16k segments.

The new option was introduced to prevent the compatibility problems for code that would not expect the EOF being read in a sinhle read(), as Josh explained in his initial PR. This way the faster (but breaking) behavior can be enabled explicitly. 

The info is also in the PR, in case there is time to review in the future.
History
Date User Action Args
2022-04-11 14:59:17adminsetgithub: 81536
2022-03-14 15:49:20vstinnersetnosy: - vstinner
2022-03-13 20:35:59Safihresetmessages: + msg415095
2022-03-13 20:22:51christian.heimessetmessages: + msg415094
2022-03-13 19:17:57asvetlovsetnosy: - asvetlov
2022-03-13 12:12:54Safihresetmessages: + msg415035
2022-02-24 10:00:31Safihresetmessages: + msg413900
2022-02-22 07:31:19Safihresetpull_requests: + pull_request29622
2021-07-01 16:20:03jakirkhamsetnosy: + jakirkham
2021-07-01 10:11:21christian.heimessetassignee: christian.heimes ->
components: - Installation, Interpreter Core, Library (Lib), Windows, XML, ctypes, email
versions: + Python 3.11, - Python 3.10
2021-07-01 08:40:11pitrousetnosy: + Matthew Rocklin
2021-07-01 04:31:09kevinpowell9601setassignee: christian.heimes

components: + Installation, Interpreter Core, Library (Lib), Windows, XML, ctypes, email
nosy: + paul.moore, tim.golden, r.david.murray, barry, zach.ware, steve.dower
2021-06-30 15:51:29Safihresetmessages: + msg396780
2021-05-04 11:59:45christian.heimessetnosy: + vstinner, benjamin.peterson, njs, pablogsal
messages: + msg392883
2021-05-04 11:54:59Safihresetmessages: + msg392882
2021-05-01 11:21:54christian.heimessetmessages: + msg392585
2021-05-01 11:16:09Safihresetnosy: + Safihre
messages: + msg392584
2021-04-20 04:31:39josnydersetkeywords: + patch
stage: patch review
pull_requests: + pull_request24203
2021-04-19 20:06:54christian.heimessetmessages: + msg391401
2020-10-30 01:37:22Dima.Tisneksetnosy: + Dima.Tisnek
messages: + msg379904
2020-10-21 16:48:10christian.heimessetassignee: christian.heimes -> (no value)
versions: + Python 3.10, - Python 3.9
2019-07-03 21:47:35pitrousetnosy: + asvetlov, yselivanov

versions: + Python 3.9
2019-06-20 18:42:49josnydercreate