This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: openssl init 100% CPU utilization on Windows
Type: behavior Stage:
Components: Library (Lib), Windows Versions: Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, bob.bob, giampaolo.rodola, loewis, pitrou, rv, vstinner
Priority: normal Keywords:

Created on 2014-03-14 14:29 by bob.bob, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
server_side.pcap bob.bob, 2014-03-19 12:51 tcp connect with SYN packet duplication. server side dump
Messages (11)
msg213555 - (view) Author: bob bob (bob.bob) Date: 2014-03-14 14:29
We have written a server on Python 2.7.6  that uses openssl 0.9.8y running on Windows box.

Time to time our server freezes utilizing 100% CPU on ssl_init (C routine). The following is the process stack:

0021ee80 10036d19 0176d259 00000000 00000013 msvcr90+0x3b35d

0021ee9c 10022a22 0021ef1c 0176d218 1001bf62 _ssl!init_ssl+0x33b59

0021ef74 10073920 0176a158 0177ba18 00952a78 _ssl!init_ssl+0x1f862

0021ef84 10001714 0176a158 0147b350 00000000 _ssl!init_ssl+0x70760

0021ef88 0176a158 0147b350 00000000 00952a78 _ssl+0x1714

0021ef8c 0147b350 00000000 00952a78 ffffffff 0x176a158

0021ef90 00000000 00952a78 ffffffff 100d3038 0x147b350

This problem occurs when client connects to our server using slow modem channel. In this case client sends duplicated SYN TCP packet. When second SYN packet arrives to the server after SYN_ACK server's packet we have a trouble. Like this trace from server side:

140 38.807410000   172.16.35.6  192.168.52.1  TCP  74  44697 > 20444 [SYN] Seq=0 Win=14600 Len=0 MSS=1314 SACK_PERM=1 TSval=495224007 TSecr=0 WS=16

141 38.807428000   192.168.52.1 172.16.35.6   TCP  78  20444 > 44697 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=1 TSval=0 TSecr=0 SACK_PERM=1

148 39.288557000   172.16.35.6  192.168.52.1  TCP  74  44697 > 20444 [SYN] Seq=0 Win=14600 Len=0 MSS=1314 SACK_PERM=1 TSval=495224257 TSecr=0 WS=16
msg213573 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-14 17:33
Have you tried reporting this to the OpenSSL project? This doesn't sound like a bug in Python.
Also, 0.9.8y is the current latest version in the 0.9.8 line, and we are unlikely to switch the 2.7 Windows builds to OpenSSL 1.x, I think (but ultimately that's Martin call).
msg213579 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-03-14 17:59
Antoine: init_ssl would be the entry function of _ssl.c, not code from OpenSSL. 

bob: It's doubtful though that the stack trace is right, since init_ssl doesn't call itself. Did you install the Python PDB files for 2.7.6 before obtaining this stack trace?

There are two things you could try to diagnose this further
1. Capture the complete packets using tcpdump or wireshark, for use by tcpreplay. Then see whether a replay will reliably trigger this problem, and allow for others to diagnose this further - upload a Python script and the wireshark dump here.
2. (Alternatively) Debug this yourself. Build Python in debug mode, see whether the problem still reproduces, then find out the life lock in the Visual Studio debugger.

Please also report the Windows version you are using.
msg214090 - (view) Author: bob bob (bob.bob) Date: 2014-03-19 12:51
loewis:
This is correct stack:
0021ee80 10036d19 0176d259 00000000 00000013 msvcr90!memset+0x3d
0021ee9c 10022a22 0021ef1c 0176d218 1001bf62 _ssl!SHA1_Final+0x49
0021eea8 1001bf62 0021ef0c 0021ef1c 00000146 _ssl!OPENSSL_cleanse+0x92
0021eebc 10010319 0021ef0c 0021ef1c 00000000 _ssl!EVP_DigestFinal_ex+0x32
0021ef30 100035f3 0021ef74 00000004 00000000 _ssl!RAND_SSLeay+0x2a9
0021ef44 10085da0 0021ef74 00000004 00000000 _ssl!RAND_add+0x63
0021ef74 10073920 0176a158 0177ba18 00952a78 _ssl!ssl3_accept+0x40
0021ef84 10001714 0176a158 0147b350 00000000 _ssl!SSL_do_handshake+0x50
0021ef8c 0147b350 00000000 00952a78 ffffffff _ssl!PySSL_SSLdo_handshake+0x74 [c:\users\martin\27\python\modules\_ssl.c @ 478]
0021ef90 00000000 00952a78 ffffffff 100d3038 0x147b350

dump file attached.
Packets from 5 to 8 received after client process has been terminated. Thereafter, the CPU freed.
msg215067 - (view) Author: Roman O. Vlasov (rv) Date: 2014-03-28 18:39
To reproduce the 100% CPU load problem, we used a simple python TLS client on separate linux PC with Traffic control utility (tc):
	tc qdisc change dev eth0 root netem delay 5000ms

After in-depth analyzing, we realized that _ssl.c behaves differently depening on socket.timeout:

1) sock.timeout set to None, that is equivalent to s->socket_timeout==-1 in _ssl.c produces 100% CPU load if client works on bad long delay channel.
The problem is this case is that the do{} loop runs in PySSL_SSLdo_handshake() calling check_socket_and_wait_for_timeout() which immediatly returns SOCKET_IS_BLOCKING because s->sock_timeout==-1. 

2) sock.timeout set to 0 (non-blocking) makes _ssl.c immediatly return with error:
	_ssl.c: The operation did not complete

3) socket.timeout set to any positive value makes _ssl.c wait socket on select (producing no CPU load).

By default, accept() returns blocking socket with timeout set to None (1st case)

Below are some code details:

Our server class is inherited from asyncore.dispatcher.
In __init__ it executes the following statements (ripped), 
creating listening socket, accepting client connection and then doing wrap_socket:

-----------------------------------------------------------
asyncore.py: 
	sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
	sock.setblocking(0)
	sock.setsockopt(
		socket.SOL_SOCKET, 
		socket.SO_REUSEADDR,
		socket.getsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR) | 1
	)
	sock.bind(self.server_address)
	sock.listen(5)
	asyncore.loop(...)
	select.select(...)  # wait for client connection

# Here client connects

	conn, addr = sock.accept()
# conn.gettimeout() returns None which means blocking socket
	ssl_sock = ssl.wrap_socket(
            conn
            , server_side = server_side
            , certfile = certfile
            , cert_reqs = cert_reqs
            , ssl_version=ssl.PROTOCOL_SSLv3
        )

	File "C:\Python27\Lib\ssl.py", line 399, in wrap_socket
	File "C:\Python27\Lib\ssl.py", line 152, in __init__
		self.do_handshake()
	File "C:\Python27\Lib\ssl.py", line 315, in do_handshake
		self._sslobj.do_handshake()
_ssl.c:
	PySSL_SSLdo_handshake():
		// wants to read more data from socket
		if (err == SSL_ERROR_WANT_READ) {
			sockstate = check_socket_and_wait_for_timeout(self->Socket, 0);
		...
	check_socket_and_wait_for_timeout():
		/* Nothing to do unless we're in timeout mode (not non-blocking) */
		if (s->sock_timeout < 0.0)
			return SOCKET_IS_BLOCKING;		// <-- this is 1st case producing 100% CPU load
		else if (s->sock_timeout == 0.0)
			return SOCKET_IS_NONBLOCKING;	// <-- this is 2nd case returning error immediatly

We think that anyone who follows standard Python documentation (http://docs.python.org/2/library/ssl.html#server-side-operation) will get the same result if using client with delay>1000ms
msg215068 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-28 18:53
How come OpenSSL returns SSL_ERROR_WANT_READ if the socket is blocking?
Perhaps Windows returns a non-blocking socket for accept()?
Have you tried with Python 3?
msg215185 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2014-03-30 15:21
As Antoine says: if the socket is *really* in blocking mode, then SSL_do_handshake should block until the handshake if fully complete. It should not return SSL_ERROR_WANT_READ in this case.

If you have managed to build Python for yourself to diagnose this further, please add a call to ioctlsocket in this loop to verify that the socket is in blocking mode.
msg215766 - (view) Author: Roman O. Vlasov (rv) Date: 2014-04-08 15:10
Antoine, Martin, thank you for your replies. You was right: NT socket was not in blocking mode (in 1st case). I didn't knew how to determine socket mode in NT, so I explicitly set socket mode to blocking in _ssl.c before calling SSL_do_handshake() for test:

_ssl.c::PySSL_SSLdo_handshake():

	// ------------ Explicitly set blocking mode
	unsigned long iMode = 0UL; // If iMode = 0, blocking is enabled; 
	int iResult = ioctlsocket( self->Socket->sock_fd, FIONBIO, &iMode );
	if (iResult != NO_ERROR)
		printf("\nioctlsocket failed with error: %ld\n", iResult);
	// ------------
	PySSL_BEGIN_ALLOW_THREADS
	ret = SSL_do_handshake(self->ssl);
	...

Test result: SSL_do_handshake() did not return (and no CPU load).

But in general, I think that check_socket_and_wait_for_timeout() logic is erroneous:
	if (s->sock_timeout < 0.0)
		return SOCKET_IS_BLOCKING;

By default (see case #1 in my message above), timeout in Python wrapping object is -1 (self->Socket->sock_timeout==-1). But underlying NT socket is really non-blocking. This leads to 100% CPU load: PySSL_SSLdo_handshake() loop is calling check_socket_and_wait_for_timeout() which immediately returns.

NB. I use Python 2.7.6 and openssl-0.9.8y for my build and never tried Python 3.
msg215903 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-10 19:42
Thanks. It would be nice if you could investigate why the underlying NT socket is in non-blocking mode (AFAIK, this issue hasn't been reported before). Also, it would be nice if you could check with Python 3 as well.
msg217232 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-27 00:26
> Please also report the Windows version you are using.

I don't see the answer to this question
msg239150 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-03-24 16:24
Is this still considered valid considering from https://www.python.org/downloads/release/python-279/ "The entirety of Python 3.4's ssl module has been backported for Python 2.7.9. See PEP 466 for justification." ?
History
Date User Action Args
2022-04-11 14:57:59adminsetgithub: 65123
2016-09-08 15:19:24christian.heimessetstatus: open -> closed
resolution: out of date
2015-03-24 16:24:37BreamoreBoysetnosy: + BreamoreBoy
messages: + msg239150
2014-04-27 00:26:10vstinnersetmessages: + msg217232
title: openssl init 100% CPU utilization -> openssl init 100% CPU utilization on Windows
2014-04-10 19:42:34pitrousetmessages: + msg215903
2014-04-08 15:10:44rvsetmessages: + msg215766
2014-03-30 15:21:38loewissetmessages: + msg215185
2014-03-28 18:53:27pitrousetnosy: + giampaolo.rodola
type: resource usage -> behavior
messages: + msg215068
2014-03-28 18:39:28rvsetnosy: + rv
messages: + msg215067
2014-03-19 12:51:14bob.bobsetfiles: + server_side.pcap

messages: + msg214090
2014-03-14 17:59:58loewissetmessages: + msg213579
2014-03-14 17:36:49vstinnersetnosy: + vstinner
2014-03-14 17:33:08pitrousetnosy: + loewis, pitrou
messages: + msg213573
2014-03-14 14:29:14bob.bobcreate