classification
Title: ssl module doesn't support non-blocking handshakes
Type: enhancement Stage:
Components: Documentation, Library (Lib), Tests Versions: Python 3.0, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: janssen Nosy List: chris.stawarz, facundobatista, giampaolo.rodola, gundlach, jafo, janssen, jcea, orsenthil, pitrou
Priority: normal Keywords: patch

Created on 2007-10-09 22:48 by chris.stawarz, last changed 2012-06-28 16:04 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
ssl_nonblocking_handshake_patch.txt chris.stawarz, 2007-10-09 22:48
nonblocking_handshake.py chris.stawarz, 2007-10-12 17:19
blocking_handshake.py chris.stawarz, 2007-10-12 17:19
Messages (29)
msg56295 - (view) Author: Chris Stawarz (chris.stawarz) Date: 2007-10-09 22:48
The current version of the ssl module doesn't support non-blocking
creation of SSLSocket objects.  The reason for this is that the SSL
handshaking (SSL_connect/SSL_accept) takes place during the
construction of the SSLContext object (in newPySSLObject).  This means
that if the socket being wrapped is non-blocking, and the handshake
fails with SSL_ERROR_WANT_READ/SSL_ERROR_WANT_WRITE, then the entire
SSLContext is scrapped, and newPySSLObject must be run again in its
entirety.  Unfortunately, restarting from scratch on the same socket
appears to confuse the remote host, and the new attempt fails.

The attached patch fixes this problem by removing the handshaking code
from newPySSLObject and adding a do_handshake method to SSLContext.
It also adds a new parameter (do_handshake_on_connect) to the
SSLSocket constructor and the wrap_socket function.  The default value
of the parameter is True, which preserves the current behavior of the
module by immediately calling do_handshake after sslwrap.  If
do_handshake_on_connect is set to False, then the caller is
responsible for calling do_handshake.  This allows code that uses
non-blocking sockets to first create the SSLSocket and then
iteratively call do_handshake and select.select until the process
completes (which is exactly how non-blocking reads and writes are
handled).
msg56307 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-10-10 03:18
Assigning to Bill Janssen.
msg56309 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-10-10 04:09
By a bit of synchronicity, I've been looking at non-blocking SSL issues 
all day.  Thanks, Chris.  I'll take a look and fold it in.  There are a 
number of other issues here, too, such as changing the socket's blocking-
ness after it's wrapped (which asyncore does).
msg56311 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-10-10 12:27
The larger problem here is that straightforward select() just doesn't 
work with SSL-wrapped sockets.  If you depend on it for liveness, your 
program will eventually hang up.

When packets in SSL arrive at a destination, they are pulled off the 
socket in chunks of sizes controlled by the encryption protocol being 
used, decrypted, and placed in SSL-internal buffers.  The buffer content 
is then transferred to the application program through SSL_read().  If 
you've read only part of the decrypted data, there will still be pending 
input data on the SSL connection, but it won't show up on the underlying 
file descriptor via select().  Your code needs to call SSL_pending() 
explicitly to see if there is any pending data to be read.
msg56314 - (view) Author: Chris Stawarz (chris.stawarz) Date: 2007-10-10 14:49
Yeah, the pattern for doing non-blocking I/O with select() is  
different for SSL-wrapped sockets:  You always have to try the  
potentially-blocking operation first, and then call select() and  
retry in response to SSL_ERROR_WANT_READ/WRITE.  (You can also check  
SSL_pending(), but I don't think you really have to.)  Also, unlike  
normal sockets, SSL-wrapped sockets *must* be set non-blocking.

I can see how this pattern might not play nicely with asyncore.  But  
I think this is a separate (though related) issue from the one I  
reported.  As it's currently implemented, the ssl module provides no  
way of wrapping a socket without (potentially) blocking during the  
handshake, making it unusable by Twisted or any other package that  
requires all I/O to be non-blocking.  Moving the handshaking into a  
separate method solves this problem.
msg56316 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-10-10 17:16
"SSL-wrapped sockets *must* be set non-blocking."

Can you say a bit more about why?
msg56318 - (view) Author: Chris Stawarz (chris.stawarz) Date: 2007-10-10 18:13
I meant that SSL-wrapped sockets must be set non-blocking in the case  
where you want to do non-blocking I/O with them using select().  This  
is another difference between SSL-wrapped sockets and normal  
sockets.  With a normal socket, as long as you use select() to know  
when a read or write won't block, it shouldn't matter whether you've  
called setblocking(False) on the socket (although there may be corner  
cases where it does).

With an SSL-wrapped socket, you have to try the I/O operation first,  
and then call select() if it fails with SSL_ERROR_WANT_READ/WRITE.   
But that won't happen if the socket is in blocking mode.  In that  
case, the OpenSSL call will just block until the operation completes  
(or an error or disconnection occurs).

That's my understanding, anyway, based on the OpenSSL man pages and  
my own usage.
msg56320 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-10-10 19:07
Yes, that's correct.

I've reviewed your patch, and it looks OK to me.  I'll fold it in in the 
next go-around, once I've made some progress on the asyncore issues.
msg56379 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-10-12 16:43
Chris,

Looking at this a bit harder, I don't see that it accomplishes much.  
The loop in _ssl.c/do_handshake will never return WANT_READ or 
WANT_WRITE, so the loop in the test case, for instance, is unnecessary.  
The original code handled the non-blocking case just fine.  I think this 
fix is unnecessary.
msg56380 - (view) Author: Chris Stawarz (chris.stawarz) Date: 2007-10-12 17:19
> The loop in _ssl.c/do_handshake will never return WANT_READ or
> WANT_WRITE, so the loop in the test case, for instance, is  
> unnecessary.

I don't know why you think that, but it's easy enough to show that  
this statement is incorrect.  I've attached two scripts  
(nonblocking_handshake.py and blocking_handshake.py).  The first does  
basically the same thing as my test case, but connecting to a  
different server and with some print statements added.  Here's the  
output I get when I run it using an up-to-date trunk checkout with my  
patch applied:

$ ../build/bin/python2.6 nonblocking_handshake.py
starting handshake
need read
need read
need read
handshake complete

The second reproduces the situation that led me to file this bug  
report in the first place.  Here's what happens when I run it with an  
*unpatched* trunk build:

$ ../build/bin/python2.6 blocking_handshake.py
starting handshake
need read
Traceback (most recent call last):
   File "blocking_handshake.py", line 14, in <module>
     s = ssl.wrap_socket(s,cert_reqs=ssl.CERT_NONE)
   File "/Users/cstawarz/Documents/Code/Python/svn/build/lib/ 
python2.6/ssl.py", line 466, in wrap_socket
     ssl_version=ssl_version, ca_certs=ca_certs)
   File "/Users/cstawarz/Documents/Code/Python/svn/build/lib/ 
python2.6/ssl.py", line 103, in __init__
     cert_reqs, ssl_version, ca_certs)
ssl.SSLError: [Errno 1] _ssl.c:429: error:04077068:rsa  
routines:RSA_verify:bad signature

As you see, in both cases the handshaking fails with  
SSL_ERROR_WANT_READ.  But without the fixes introduced by my patch,  
there's no way to handle the error.
msg56387 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-10-13 16:52
It's my mistake; I was looking at too many patches at the same time.
Thanks for the example.

Bill
msg56454 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-10-15 17:53
Perhaps we shouldn't expose this at the application level.  We could 
check, in the C module's sslwrap, whether the socket is blocking or not, 
and do the right thing there, so that sslwrap would always succeed in 
one call.  Since we are releasing the GIL whenever we do SSL_accept() or 
SSL_connect(), other threads get a chance to run, so doing it 
transparently shouldn't affect the Python program's liveness.  And it 
would reduce the chances for application error.
msg56476 - (view) Author: Chris Stawarz (chris.stawarz) Date: 2007-10-16 00:11
Bill,

You seem to be missing the whole point of doing non-blocking I/O,  
which is to handle multiple concurrent, I/O-bound tasks in a single  
OS thread.  The application must be able to try the handshake, detect  
that it didn't complete because I/O needs to take place, and then  
retry it at a later time of its choosing (presumably when select()  
says it's OK to read from the socket).  And between now and that  
later time, it can do other things, like read or write to other sockets.

The point is that the *application* must have control over when it  
does I/O, when it waits for sockets to be ready for I/O, and what it  
does in between.  There's just no way it can do this if the sslwrap()  
function doesn't return until the handshaking is complete.  sslwrap()  
can't "do the right thing", because the right thing is to return  
control to the application until it's ready to try the handshake again.

And this has nothing to do with the GIL or multiple threads.  Like I  
said, the point of doing non-blocking I/O with select() is to *avoid*  
the need for multiple threads.
msg56486 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-10-16 02:29
"You seem to be missing the whole point of doing non-blocking I/O,"

You know, I think that's right.  I'm so used to using threads by now
that the whole select-based approach seems very odd to me by now.

OK, I've folded your patch into the PyPI version, ssl 1.8.  It will
meander into the trunk and 3K version as well.  Thanks for the help!

Bill

On 10/15/07, Chris Stawarz <report@bugs.python.org> wrote:
>
> Chris Stawarz added the comment:
>
> Bill,
>
> You seem to be missing the whole point of doing non-blocking I/O,
> which is to handle multiple concurrent, I/O-bound tasks in a single
> OS thread.  The application must be able to try the handshake, detect
> that it didn't complete because I/O needs to take place, and then
> retry it at a later time of its choosing (presumably when select()
> says it's OK to read from the socket).  And between now and that
> later time, it can do other things, like read or write to other sockets.
>
> The point is that the *application* must have control over when it
> does I/O, when it waits for sockets to be ready for I/O, and what it
> does in between.  There's just no way it can do this if the sslwrap()
> function doesn't return until the handshaking is complete.  sslwrap()
> can't "do the right thing", because the right thing is to return
> control to the application until it's ready to try the handshake again.
>
> And this has nothing to do with the GIL or multiple threads.  Like I
> said, the point of doing non-blocking I/O with select() is to *avoid*
> the need for multiple threads.
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1251>
> __________________________________
>
msg57609 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2007-11-17 22:45
This is now checked into the 3K branch.
msg63830 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2008-03-18 00:59
Should this be back-ported to 2.6, or can it be closed?
msg63943 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2008-03-18 16:55
I'm working on it.  I'll close it when it's finished.
msg66886 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2008-05-15 22:55
I'm hitting this issue aswell. How is going?.

I'm creating a socket with a, let say, 5 seconds timeout. The timeout
works fine before the "wrap_socket()", and after it. But the timeout
doesn't work WHILE in the "wrap_socket()" method call.

What can I do?.

If I need to call "do_handshake()" myself, working with
SSL_ERROR_WANT_READ/WRITE, I think this *needs* to be documented somewhere.

That is, any difference between "normal" sockets and "ssl" sockets need
to be documented in docs. Explicitly.

My opinion, of course :).
msg66887 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2008-05-16 00:21
Here's what's in the 3.0 docs:

The parameter do_handshake_on_connect specifies whether to do the SSL
handshake automatically after doing a socket.connect(), or whether the
application program will call it explicitly, by invoking the
SSLSocket.do_handshake()<http://docs.python.org/dev/3.0/library/ssl.html#ssl.SSLSocket.do_handshake>method.
Calling
SSLSocket.do_handshake()<http://docs.python.org/dev/3.0/library/ssl.html#ssl.SSLSocket.do_handshake>explicitly
gives the program control over the blocking behavior of the
socket I/O involved in the handshake.

Look at test.test_ssl.testNonBlockingHandshake() in 3.0alpha or in the PyPI
module.  I'm still working on 2.6.

Bill

On Thu, May 15, 2008 at 3:56 PM, Jesús Cea Avión <report@bugs.python.org>
wrote:

>
> Jesús Cea Avión <jcea@jcea.es> added the comment:
>
> I'm hitting this issue aswell. How is going?.
>
> I'm creating a socket with a, let say, 5 seconds timeout. The timeout
> works fine before the "wrap_socket()", and after it. But the timeout
> doesn't work WHILE in the "wrap_socket()" method call.
>
> What can I do?.
>
> If I need to call "do_handshake()" myself, working with
> SSL_ERROR_WANT_READ/WRITE, I think this *needs* to be documented somewhere.
>
> That is, any difference between "normal" sockets and "ssl" sockets need
> to be documented in docs. Explicitly.
>
> My opinion, of course :).
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue1251>
> __________________________________
>
msg66930 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2008-05-16 11:11
Thanks, Bill. I was reading 2.6 preview documentation, and nothing is
said there.
msg72269 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2008-09-01 16:14
This issue is yet not fixed for both Py2.6 and Py3k. The tests which are
present in code are not run (or disabled) in test_ssl.py

I understand, customers have a good chance of hitting upon this issue.
When ssl do_handshake() does not timeout and application just hangs!

Janssen, would you like to close on this?

Issue1424152 (for certain scenarios) has a dependency upon this one.
msg72283 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2008-09-01 18:37
I believe this is now implemented in all the branches.  And when I run the 
tests, they run fine.  There's still an issue with unwrap; it does a 
blocking tear-down of the SSL session, and can block when you don't want 
it to.  I'll have to look further at that.
msg72363 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2008-09-02 19:38
Yes Janssen, I checked again and found it implemented in both trunk
(py26) and py3k. All the tests pass as well.

However, in one of my testcases for issue1424152, where I expected the 
timeout to happen for do_handshake(), it did not take effect. I shall
look for the reasons.

The following is the tail from the traceback.  

  File "/usr/local/lib/python2.6/httplib.py", line 1095, in connect
    self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
  File "/usr/local/lib/python2.6/ssl.py", line 316, in wrap_socket
    suppress_ragged_eofs=suppress_ragged_eofs)
  File "/usr/local/lib/python2.6/ssl.py", line 116, in __init__
    self.do_handshake()
  File "/usr/local/lib/python2.6/ssl.py", line 260, in do_handshake
    self._sslobj.do_handshake()
KeyboardInterrupt
msg72445 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2008-09-04 01:11
Thanks.  If you can identify a specific bug, I'll take a look at it.
msg79569 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2009-01-10 19:41
Bill, should this issue be closed?

Or Senthil found a bug in the actual code and you're waiting for him to
point out where is the problem or a way to reproduce it?
msg79745 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2009-01-13 15:35
Well, maybe he found something -- never reported back.  But it was a few
months ago...  I'm in no hurry to close it, though.
msg104372 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-27 21:49
do_handshake() not respecting the socket timeout was fixed in r80452.
msg164267 - (view) Author: Michael Gundlach (gundlach) Date: 2012-06-28 15:56
I recently started getting what appears to be this bug in 2.6.6 and 2.7.3 when connecting to Google.

My script does this:

  while True:
    get an IMAP connection to Google, if our old one timed out
    fetch unread messages.  if any:
      make an SMTP connection to Google and send some emails 
      mark the unread messages as read
    sleep 10 seconds

It used to run fine on 2.6.  I'm assuming something changed at Google, because it started hanging, whether in 2.6 or 2.7.  Here's the stack trace when I eventually press Ctrl-C:

 File "./main.py", line 21, in loop
    the_emails = list(emails.messages('(UNSEEN)'))
  File "./emails.py", line 129, in messages
    for chunk in chunks_of_length(32, messages):
  File "./chunks.py", line 9, in chunks_of_length
    for item in iterable:
  File "./emails.py", line 90, in email_messages
    m = open_mailbox(label, readonly=True)
  File "./emails.py", line 30, in open_mailbox
    m = imaplib.IMAP4_SSL('imap.gmail.com', 993)
  File "/usr/lib64/python2.6/imaplib.py", line 1138, in __init__
    IMAP4.__init__(self, host, port)
  File "/usr/lib64/python2.6/imaplib.py", line 163, in __init__
    self.open(host, port)
  File "/usr/lib64/python2.6/imaplib.py", line 1150, in open
    self.sslobj = ssl.wrap_socket(self.sock, self.keyfile, self.certfile)
  File "/usr/lib64/python2.6/ssl.py", line 338, in wrap_socket
    suppress_ragged_eofs=suppress_ragged_eofs)
  File "/usr/lib64/python2.6/ssl.py", line 120, in __init__
    self.do_handshake()
  File "/usr/lib64/python2.6/ssl.py", line 279, in do_handshake
    self._sslobj.do_handshake()
KeyboardInterrupt

FWIW, for the last couple of days my script is *not* hanging, so perhaps Google changed something on their end, avoiding the Python bug.

Should this Issue be reopened or should I file a new Issue?
msg164270 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-28 16:04
[...]
> 
> Should this Issue be reopened or should I file a new Issue?

I would prefer a new issue to be filed (assuming the error happens
again).
History
Date User Action Args
2012-06-28 16:04:26pitrousetmessages: + msg164270
2012-06-28 15:56:03gundlachsetnosy: + gundlach
messages: + msg164267
2010-04-27 21:49:23pitrousetstatus: open -> closed

nosy: + pitrou
messages: + msg104372

resolution: fixed
2009-01-13 15:35:20janssensetmessages: + msg79745
2009-01-10 19:41:48facundobatistasetnosy: + facundobatista
messages: + msg79569
2008-09-04 01:11:01janssensetmessages: + msg72445
2008-09-02 19:38:19orsenthilsetmessages: + msg72363
2008-09-01 18:37:53janssensetfiles: - unnamed
2008-09-01 18:37:43janssensetmessages: + msg72283
2008-09-01 16:14:41orsenthilsetnosy: + orsenthil
messages: + msg72269
versions: + Python 3.0
2008-05-16 11:11:24jceasetmessages: + msg66930
2008-05-16 00:21:50janssensetfiles: + unnamed
messages: + msg66887
2008-05-15 22:55:51jceasetmessages: + msg66886
2008-05-15 22:40:38jceasetnosy: + jcea
2008-03-18 16:55:10janssensetmessages: + msg63943
2008-03-18 00:59:18jafosetpriority: normal
keywords: + patch
messages: + msg63830
nosy: + jafo
2007-12-13 01:29:02giampaolo.rodolasetnosy: + giampaolo.rodola
2007-11-17 22:45:39janssensetmessages: + msg57609
2007-10-16 02:29:33janssensetmessages: + msg56486
2007-10-16 00:11:31chris.stawarzsetmessages: + msg56476
2007-10-15 17:53:09janssensetmessages: + msg56454
2007-10-13 16:52:01janssensetmessages: + msg56387
2007-10-12 17:19:51chris.stawarzsetfiles: + nonblocking_handshake.py, blocking_handshake.py
messages: + msg56380
2007-10-12 16:43:30janssensetmessages: + msg56379
2007-10-10 19:07:21janssensetmessages: + msg56320
2007-10-10 18:13:06chris.stawarzsetmessages: + msg56318
2007-10-10 17:16:53janssensetmessages: + msg56316
2007-10-10 16:39:58gvanrossumsetnosy: - gvanrossum
2007-10-10 14:50:00chris.stawarzsetmessages: + msg56314
2007-10-10 12:27:16janssensetmessages: + msg56311
2007-10-10 04:09:24janssensetmessages: + msg56309
2007-10-10 03:18:36gvanrossumsetassignee: janssen
messages: + msg56307
nosy: + gvanrossum, janssen
2007-10-09 22:48:18chris.stawarzcreate