classification
Title: ssl documentation needs comments about non-blocking behaviour
Type: behavior Stage: resolved
Components: Documentation, Library (Lib) Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, jcea, pitrou, python-dev, vstinner
Priority: normal Keywords:

Created on 2011-06-16 01:26 by jcea, last changed 2011-07-10 23:42 by pitrou. This issue is now closed.

Messages (15)
msg138408 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-06-16 01:26
Combining non-blocking SSL sockets and "select()" raises "ssl.SSLError: [Errno 2] _ssl.c:503: The operation did not complete (read)" in Python 2.7.2, but works OK in 2.7.1, 2.6.* and previous.

This test shows the issue:

"""
import ssl
import socket
import select

s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.connect(("gmail.com", 443))
s.setblocking(0)
s = ssl.wrap_socket(s)
s.write("GET / HTTP/1.0\r\nHost: gmail.com\r\n\r\n")
select.select([s], [], [])

s.read(9999999)
print "OK!"
"""

Under python 2.6 and 2.7.1 I get "OK", but under 2.7.2 I get:

"""
Traceback (most recent call last):
  File "z.py", line 8, in <module>
    s = ssl.wrap_socket(s)
  File "/usr/local/lib/python2.7/ssl.py", line 372, in wrap_socket
    ciphers=ciphers)
  File "/usr/local/lib/python2.7/ssl.py", line 134, in __init__
    self.do_handshake()
  File "/usr/local/lib/python2.7/ssl.py", line 296, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [Errno 2] _ssl.c:503: The operation did not complete (read)
"""

Changing the order between the "s.setblocking()" and the "ssl.wrap_socket()" seems to solve the issue in this particular code sample, but this seems to be a serious regression in 2.7.2. I have programs in production here where changing the order doesn't solve the exception :-(. Dead in the water! :-(
msg138424 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-06-16 11:48
This is not a bug. With a non-blocking socket, the handshake itself is non-blocking, so you have to be prepared to retry. Your snippet also fails under Python 2.6 for me, so this isn't a regression either.

If you want to know how to do a non-blocking handshake, see e.g. 
http://hg.python.org/cpython/file/fd6446a88fe3/Lib/test/test_ssl.py#l632
or (using asyncore)
http://hg.python.org/cpython/file/fd6446a88fe3/Lib/test/test_ftplib.py#l299
msg138425 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-06-16 12:45
I am not getting the error in the handshake. I am getting it when transfering data, after a few Kbytes are already transfered.

This code has been working for the last 8 years, including 2.7.1. It is failing now, under 2.7.2. OpenSSL version haven't changed since 2.7.1.

So, it is something working under 2.7.1 that is not working under 2.7.2.

If you get an error in 2.6, I guess this is somewhat dependent of the OpenSSL version or OS. I am running Ubuntu 10.04, OpenSSL 0.9.8k-7ubuntu8.
msg138426 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-06-16 12:47
I am talking about the code in production, not the code I pasted yesterday.
msg138427 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-06-16 12:53
> I am not getting the error in the handshake. I am getting it when
> transfering data, after a few Kbytes are already transfered.

Your traceback (and mine as well) occurs in wrap_socket() which itself calls do_handshake(). I don't understand how you could have transferred data *before* the SSL session is established. Or perhaps the test you posted doesn't reflect the actual issue?

By the way, faulty non-blocking code can work out of pure luck, if the target server and Internet connection is faster than the local computer...
msg138428 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-06-16 12:55
Ah, ok, looks like your messages crossed each other. Can you try to devise another test case, then?
Or, at least, explain the context and how and where it fails?
msg138429 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-06-16 13:17
Protecting my "reads" retrying when getting this exception does the trick, but now my code is convoluted and never before I had to manage this directly. This worked fine in 2.7.1. Previously Python seemed to do the retry itself.
msg138430 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-06-16 13:20
Note: the only significant changes in the ssl module on branch 2.7 have been 742d73a99425 and 7f99ac53014a.

> Protecting my "reads" retrying when getting this exception does the
> trick, but now my code is convoluted and never before I had to manage
> this directly. This worked fine in 2.7.1. Previously Python seemed to
> do the retry itself.

I'm sure it didn't. I think you were just lucky that you had enough data to satisfy the read. With non-blocking code, by definition you have to be prepared to retry. That's especially true when using SSL, because having bytes available on the socket (as signalled by select()) doesn't mean there's anything to read on the SSL layer (for example, the bytes may not form a complete SSL frame).
msg138432 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-06-16 13:44
I have write a testcase, but adding a small delay between reads, allowing more data coming from the server, solves the issue, so this seems to be a race condition. Trying with a remote slow SSL server, I get the same error in python 2.6, so I guess I was being lucky for the last 8 years :-/.

Pitrou, just confirm me that I must catch & retry the exception in all my READ/WRITE, not only in the handshake.

Thanks, and sorry for wasting your time. I know about this SSL details, but my code worked just fine until 2.7.2...
msg138433 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-06-16 13:51
It's maybe because Python 2.7.2 is faster/slower? :-)
msg138434 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-06-16 13:57
Antoine, now I am worried about writes, since my usual aproach would be something like this:

"""
select says that it is Ok to try to write
buf = buf[socket.send(buf):]
"""

If now I can get a "retry" while writing, what is the logic?. Does Python retry automatically, internally?. Do I get a frame worth of ack and then the exception (in the next loop iteration), garanteeing nothing was written, or what?.

I think SSL module documentation should write down this captchas when using non-blocking sockets. That is, that read/write can raise exceptions even if "select" was OK.
msg138436 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-06-16 14:07
> If now I can get a "retry" while writing, what is the logic?. Does
> Python retry automatically, internally?.

No, Python doesn't retry automatically. You have to call send() again with the same buffer.
(if Python retried, it would make non-blocking calls blocking)

Also, the args[0] of the SSL error tells you what the SSL layer is blocking on (SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE). 

Also, with the current setup, non-blocking write() will never succeed with a partial write. See issue12197.
(that doesn't mean nothing is written on the network, though. OpenSSL may start emitting bytes and keep track internally of what remains to be sent. The OpenSSL documentation isn't very clear on that, IIRC)

> I think SSL module documentation should write down this captchas when 
> using non-blocking sockets. That is, that read/write can raise
> exceptions even if "select" was OK.

Agreed.

Relatedly, there's issue10084 for SSL support in asyncore.
msg138437 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-06-16 14:08
So let's turn this into a documentation issue, then.
msg140093 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-07-10 23:40
New changeset b763c1ba5589 by Antoine Pitrou in branch '3.2':
Issue #12343: Add some notes on behaviour of non-blocking SSL sockets.
http://hg.python.org/cpython/rev/b763c1ba5589

New changeset 77334eb5038d by Antoine Pitrou in branch 'default':
Issue #12343: Add some notes on behaviour of non-blocking SSL sockets.
http://hg.python.org/cpython/rev/77334eb5038d
msg140094 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-07-10 23:42
I'd say this is fixed now. Tell me if there are any precisions you would like to see added.
History
Date User Action Args
2011-07-10 23:42:05pitrousetstatus: open -> closed
versions: - Python 2.7
messages: + msg140094

resolution: fixed
stage: needs patch -> resolved
2011-07-10 23:40:40python-devsetnosy: + python-dev
messages: + msg140093
2011-06-16 14:08:32pitrousetstatus: closed -> open


assignee: docs@python
title: Python 2.7.2 regression: ssl.SSLError: [Errno 2] _ssl.c:503: The operation did not complete (read) -> ssl documentation needs comments about non-blocking behaviour
nosy: + docs@python
versions: + Python 3.2, Python 3.3
messages: + msg138437
priority: critical -> normal
components: + Documentation, Library (Lib), - Extension Modules
resolution: not a bug -> (no value)
2011-06-16 14:07:36pitrousetmessages: + msg138436
2011-06-16 13:57:27jceasetmessages: + msg138434
2011-06-16 13:51:30vstinnersetnosy: + vstinner
messages: + msg138433
2011-06-16 13:44:22jceasetmessages: + msg138432
2011-06-16 13:20:15pitrousetmessages: + msg138430
2011-06-16 13:17:13jceasetmessages: + msg138429
2011-06-16 12:55:38pitrousetmessages: + msg138428
2011-06-16 12:53:19pitrousetmessages: + msg138427
2011-06-16 12:47:41jceasetmessages: + msg138426
2011-06-16 12:45:50jceasetmessages: + msg138425
2011-06-16 11:48:26pitrousetstatus: open -> closed

nosy: + pitrou
messages: + msg138424

resolution: not a bug
2011-06-16 01:26:07jceacreate