classification
Title: SSLSocket.send() returns 0 for non-blocking socket
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Ben.Darnell, christian.heimes, giampaolo.rodola, janssen, nikratio, pitrou, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2014-03-16 21:16 by nikratio, last changed 2014-05-01 12:09 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
issue20951.diff nikratio, 2014-03-16 21:52 review
issue20951.diff nikratio, 2014-03-19 01:16 review
issue20951.diff nikratio, 2014-03-21 00:57
deprecation_patch.diff nikratio, 2014-03-25 02:04 review
docpatch.diff nikratio, 2014-03-27 03:48 review
issue20951_r2.diff nikratio, 2014-04-27 22:40 review
issue20951_r3.diff nikratio, 2014-04-29 02:51
Messages (31)
msg213759 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-16 21:16
When using non-blocking operation, the SSLSocket.send method returns 0 if no data can be sent at this point.

This is counterintuitive, because in the same situation (write to non-blocking socket that isn't ready for IO):

 * A regular (non-SSL) socket raises BlockingIOError
 * libc's send(2) does not return 0, but -EAGAIN or -EWOULDBLOCK.
 * OpenSSL's ssl_write does not return 0, but returns an SSL_ERROR_WANT_WRITE error
 * The ssl module's documentation describes the SSLWantWrite exception as "A subclass of SSLError raised by a non-blocking SSL socket when trying to read or write data, but more data needs to be sent on the underlying TCP transport before the request can be fulfilled."
 * Consistent with that, trying to *read* from a non-blocking SSLSocket when no data is ready raises SSLWantRead, instead of returning zero.

This behavior also makes it more complicated to write code that works with both SSLSockets and regular sockets.


Since the current behavior undocumented at best (and contradicting the documentation at worst), can we change this in Python 3.5?
msg213761 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-16 21:52
This is actually seems to be not just an inconvience, but a real bug: since SSLSocket.sendall() uses SSLSocket.send() internally, the former method will busy-loop when called on a non-blocking socket. 

Note also that the .sendto and .write methods already behave consistently and raise SSLWantWrite. It seems it's really just the send() method that is the lone outlier.

The attached patch changes ssl.send to raise SSLWantWrite instead of returning zero. The  full testsuite still runs fine. I'm a bit sceptical though, because the code looks as if send() was deliberately written to catch the SSLWantWrite exception and return zero instead.. Can anyone familiar with the code comment on this?
msg213764 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-16 22:11
A little hg sleuthing (which I assume you did but I'll record for the record) reveals that this was introduced by Bill Jansen in changeset 8a281bfc058d.  Following the bugs mentioned in the checkin message, it looks like it *might* have been related to issue 1251, but there really isn't enough information in the issues or the checkin to tell for sure.  It certainly sounds like the problems mentioned in that issue may be relevant, though (the disconnection between the unecrypted data send and what actually gets placed on the wire and when).

I see you already added Bill Jansen to nosy, so that's probably the best bet for getting an answer, if we are lucky and he both responds and remembers :)
msg213766 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-16 22:26
It's probably too late to change this, unfortunately. There are non-blocking frameworks and libraries out there relying on the current behaviour.

As for sendall(), it doesn't really make sense on a non-blocking socket anyway.
msg213774 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-16 23:28
Antoine, do you know that there are frameworks out there using this, or is that a guess? asyncio, for example, seems to expect an SSLWantWrite exception as well. (it also works with a zero return, but it's not clear from the code if that's by design or by a chance).
msg213776 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-16 23:35
> Antoine, do you know that there are frameworks out there using this,
> or is that a guess?

It's just a guess.
msg213778 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-16 23:40
Twisted does not seem to rely on it either (there's no mention of SSLWant* in the source at all, and without that, you can't possibly have support for non-blocking ssl sockets).
msg213779 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-16 23:53
gevent is calling _sslobject.write() directly, so it would not be affected by any change.
msg213780 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-16 23:57
Tornado uses SSLSocket.send(), and it looks as if a SSLWantWrite exception is not caught but would propagate, so this would probably break.
msg213782 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-17 00:03
More info on twisted: it uses PyOpenSSL rather than the stdlib ssl module, so it's not affected at all.
msg214042 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-19 01:16
Since this behavior cannot be changed without breaking third-party libraries (why did they work around this rather than reporting a bug?), I'd suggest to document the current behavior and allow programs to opt-in to getting exceptions.

I've attached a patch to that end. Feedback would be appreciated.
msg214121 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-19 18:38
I don't think complicating the situation by exposing two different kinds of non-blocking sockets is the solution here.

Either we decide it is worth breaking compatibility and we change the behaviour by default (I'm rather against this), or we simply document the discrepancy.
msg214316 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-21 00:48
I'd like to argue with the wise words of Nick Coghlan here:

--snip--
There's a great saying in the usability world: "You can't document your way out of a usability problem". What it means is that if all the affordances of your application (or programming language!) push users towards a particular logical conclusion ([...]), having a caveat in your documentation isn't going to help, because people aren't even going to think to ask the question. It doesn't matter if you originally had a good reason for the behaviour, you've ended up in a place where your behaviour is confusing and inconsistent, because there is one piece of behaviour that is out of line with an otherwise consistent mental model. 
--snip--

This was said in context of the bool(datetime.time) discussion, but I think it applies here as well. The rest of Python consistently raises an exception when something would block in non-blocking mode. This is reasonable behavior to expect. I agree that we shouldn't suddenly break this, but emitting a deprecation warning in Python 3.5, and changing the default in 3.6 seems reasonable to me. This is three years of transition time, and based on my random sampling so far, I doubt that there are a lot of affected modules or applications.
msg214772 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-25 02:04
(refreshed patch)
msg214847 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-25 19:45
> There's a great saying in the usability world: "You can't document 
> your way out of a usability problem".

However, adding a flag to change behaviour at runtime creates *another* usability problem. It's not obvious it would actually make things better (and implementors of async networking frameworks haven't asked for it, AFAICT).
msg214848 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-03-25 19:58
-1 about adding raise_on_blocking_send=False option as IMO it unnecessarily complicates the API.

Note: when working with plain sockets send() returning 0 means the connection has been closed by the other peer, same for os.sendfile().
It appears ssl module is the only one behaving differently therefore I'd be for signaling the discrepancy in the doc.
msg214876 - (view) Author: Ben Darnell (Ben.Darnell) Date: 2014-03-26 01:53
Giampaolo, where do you see that send() may return zero if the other side has closed?  I've always gotten an error in that case (EPIPE)

I vote -1 to adding a new flag to control whether it returns zero or raises and +0 to just fixing it in Python 3.5 (I don't think returning zero is an unreasonable thing to do; it's not obvious to me from send(2) that it is guaranteed to never return zero although I believe that to be the case).  It'll break Tornado, but there will be plenty of time to get a fix out before then.  If there were a convenient place to put a deprecation warning I'd vote to deprecate in 3.5 and fix in 3.6, but there's no good way for the application to signal that it expects a WANT_WRITE exception.

Another option may be to have SSLSocket.send() convert the WANT_WRITE exception into a socket.error with errno EAGAIN.  This wouldn't break Tornado and would make socket.send and SSLSocket.send more consistent, but it's weird to hide the true error like this.
msg214877 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2014-03-26 02:35
Sorry, my fault. I got confused with os.sendfile() which returns 0 on EOF.
msg214878 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-26 02:51
On 03/25/2014 06:53 PM, Ben Darnell wrote:
> Another option may be to have SSLSocket.send() convert the WANT_WRITE exception into a socket.error with errno EAGAIN.  This wouldn't break Tornado and would make socket.send and SSLSocket.send more consistent, but it's weird to hide the true error like this.

I think that would only make sense if the SSLWant{Read/Write}Error
exceptions are eliminated completely, so that all methods raise
BlockingError (==EAGAIN) instead.

Raising BlockingError is marginally better than returning zero, but I
think not worth the change.
msg214889 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-26 10:34
> I vote -1 to adding a new flag to control whether it returns zero or
> raises and +0 to just fixing it in Python 3.5 (I don't think returning
> zero is an unreasonable thing to do; it's not obvious to me from
> send(2) that it is guaranteed to never return zero although I believe
> that to be the case).  It'll break Tornado, but there will be plenty
> of time to get a fix out before then.

If that's your opinion then I'm inclined to trust you.

> Another option may be to have SSLSocket.send() convert the WANT_WRITE
> exception into a socket.error with errno EAGAIN. 

I don't think it's a good idea, since it hides the true reason of the
error (also, it suppresses the distinction between WANT_READ and
WANT_WRITE, which tells you whether you need to select() the socket for
reading or writing).
msg214931 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-03-27 03:48
As an alternative, I have attached a pure docpatch that just documents the future behavior.

Someone with commit privileges: please take your pick :-).
msg217322 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-04-27 22:40
As discussed on python-dev, here is a patch that changes the behavior of send() and sendall() to raise SSLWant* exceptions instead of returning zero.
msg217483 - (view) Author: Roundup Robot (python-dev) Date: 2014-04-29 08:03
New changeset 3cf067049211 by Antoine Pitrou in branch 'default':
Issue #20951: SSLSocket.send() now raises either SSLWantReadError or SSLWantWriteError on a non-blocking socket if the operation would block. Previously, it would return 0.
http://hg.python.org/cpython/rev/3cf067049211
msg217484 - (view) Author: Roundup Robot (python-dev) Date: 2014-04-29 08:06
New changeset b0f6983d63df by Antoine Pitrou in branch 'default':
Add porting note for issue #20951.
http://hg.python.org/cpython/rev/b0f6983d63df
msg217485 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-29 08:06
Patch finally committed. Thanks Nikolaus!
msg217489 - (view) Author: Roundup Robot (python-dev) Date: 2014-04-29 08:27
New changeset 7f50e1836ddb by Antoine Pitrou in branch 'default':
Fix failure in test_poplib after issue #20951.
http://hg.python.org/cpython/rev/7f50e1836ddb
msg217492 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-29 08:28
Ok, there was a failure in test_poplib when run with -unetwork, I fixed it.
msg217567 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-04-30 02:58
Antoine, are you sure this was a problem related to this patch?

The test seems to work just fine for me:

$ hg update -C -r b0f6983d63df
$ make clean
$ ./configure --with-pydebug && make -j1
$ ./python -m test -u network,urlfetch -j 8 test_poplib
[1/1] test_poplib
1 test OK.

Am I doing something wrong?
msg217583 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-30 09:00
> Am I doing something wrong?

I can reproduce the failure here.
There might be different behaviour accross OpenSSL versions (mine is
1.0.1e).
msg217673 - (view) Author: Nikolaus Rath (nikratio) * Date: 2014-05-01 01:00
Maybe. I have 1.0.1g. Could you maybe post the output of the failed test? I'd like to understand how the patch broke the test (looking at your patch alone didn't tell me much).
msg217689 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-05-01 12:09
Actually, the test hangs after one of the threads crashes:

test__all__ (test.test_poplib.TestPOP3_SSLClass) ... Exception in thread Thread-23:
Traceback (most recent call last):
  File "/home/antoine/cpython/default/Lib/threading.py", line 920, in _bootstrap_inner
    self.run()
  File "/home/antoine/cpython/default/Lib/test/test_poplib.py", line 218, in run
    asyncore.loop(timeout=0.1, count=1)
  File "/home/antoine/cpython/default/Lib/asyncore.py", line 212, in loop
    poll_fun(timeout, map)
  File "/home/antoine/cpython/default/Lib/asyncore.py", line 153, in poll
    read(obj)
  File "/home/antoine/cpython/default/Lib/asyncore.py", line 87, in read
    obj.handle_error()
  File "/home/antoine/cpython/default/Lib/asyncore.py", line 83, in read
    obj.handle_read_event()
  File "/home/antoine/cpython/default/Lib/asyncore.py", line 422, in handle_read_event
    self.handle_accept()
  File "/home/antoine/cpython/default/Lib/asyncore.py", line 499, in handle_accept
    self.handle_accepted(*pair)
  File "/home/antoine/cpython/default/Lib/test/test_poplib.py", line 228, in handle_accepted
    self.handler_instance = self.handler(conn)
  File "/home/antoine/cpython/default/Lib/test/test_poplib.py", line 368, in __init__
    self.push('+OK dummy pop3 server ready. <timestamp>')
  File "/home/antoine/cpython/default/Lib/test/test_poplib.py", line 82, in push
    asynchat.async_chat.push(self, data.encode("ISO-8859-1") + b'\r\n')
  File "/home/antoine/cpython/default/Lib/asynchat.py", line 190, in push
    self.initiate_send()
  File "/home/antoine/cpython/default/Lib/asynchat.py", line 243, in initiate_send
    self.handle_error()
  File "/home/antoine/cpython/default/Lib/asynchat.py", line 241, in initiate_send
    num_sent = self.send(data)
  File "/home/antoine/cpython/default/Lib/asyncore.py", line 366, in send
    result = self.socket.send(data)
  File "/home/antoine/cpython/default/Lib/ssl.py", line 667, in send
    return self._sslobj.write(data)
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:1636)


This was due to a simplistic handling of asyncore SSL connections in test_poplib, which I've fixed by reusing the code from test_ftplib.
History
Date User Action Args
2014-05-01 12:09:37pitrousetmessages: + msg217689
2014-05-01 01:00:06nikratiosetmessages: + msg217673
2014-04-30 09:00:57pitrousetmessages: + msg217583
2014-04-30 02:58:01nikratiosetmessages: + msg217567
2014-04-29 08:28:38pitrousetmessages: + msg217492
2014-04-29 08:27:24python-devsetmessages: + msg217489
2014-04-29 08:06:41pitrousetstatus: open -> closed
resolution: fixed
messages: + msg217485

stage: patch review -> resolved
2014-04-29 08:06:07python-devsetmessages: + msg217484
2014-04-29 08:03:36python-devsetnosy: + python-dev
messages: + msg217483
2014-04-29 02:51:44nikratiosetfiles: + issue20951_r3.diff
2014-04-27 22:40:54nikratiosetfiles: + issue20951_r2.diff

messages: + msg217322
2014-03-27 03:48:37nikratiosetfiles: + docpatch.diff

messages: + msg214931
2014-03-26 10:34:18pitrousetmessages: + msg214889
2014-03-26 02:51:06nikratiosetmessages: + msg214878
2014-03-26 02:35:33giampaolo.rodolasetmessages: + msg214877
2014-03-26 01:53:57Ben.Darnellsetnosy: + Ben.Darnell
messages: + msg214876
2014-03-25 19:58:03giampaolo.rodolasetmessages: + msg214848
2014-03-25 19:45:10pitrousetmessages: + msg214847
2014-03-25 02:04:20nikratiosetfiles: + deprecation_patch.diff

messages: + msg214772
2014-03-21 00:57:18nikratiosetfiles: + issue20951.diff
2014-03-21 00:57:08nikratiosetfiles: - issue20951.diff
2014-03-21 00:48:13nikratiosetmessages: + msg214316
2014-03-21 00:43:11nikratiosetfiles: + issue20951.diff
2014-03-19 18:38:28pitrousetmessages: + msg214121
2014-03-19 01:33:47r.david.murraysettype: behavior -> enhancement
stage: patch review
2014-03-19 01:16:35nikratiosetfiles: + issue20951.diff

messages: + msg214042
2014-03-17 00:03:18nikratiosetmessages: + msg213782
2014-03-16 23:57:12nikratiosetmessages: + msg213780
2014-03-16 23:53:35nikratiosetmessages: + msg213779
2014-03-16 23:40:27nikratiosetmessages: + msg213778
2014-03-16 23:35:16pitrousetmessages: + msg213776
2014-03-16 23:28:08nikratiosetmessages: + msg213774
2014-03-16 22:26:57pitrousetmessages: + msg213766
2014-03-16 22:11:31r.david.murraysetnosy: + r.david.murray
messages: + msg213764
2014-03-16 21:52:12nikratiosetfiles: + issue20951.diff
keywords: + patch
messages: + msg213761
2014-03-16 21:16:31nikratiocreate