This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_ftplib fails with OpenSSL 0.9.8m
Type: crash Stage: resolved
Components: Extension Modules Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pitrou Nosy List: Arfrever, dlmiles, flox, giampaolo.rodola, janssen, jcea, max0x7ba, pitrou, vstinner
Priority: critical Keywords: buildbot, patch

Created on 2010-03-10 13:44 by flox, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
ssl_shutdown.patch vstinner, 2010-03-12 00:53
testftpd.patch giampaolo.rodola, 2010-03-12 22:01
issue8108_openssl_098m.diff flox, 2010-03-25 00:28 Patch, apply to 2.x
issue8108_print_ssl_want_read.diff flox, 2010-03-25 14:11 debugging on 2.x
issue8108_ftplib_ssl_098m_v2.diff flox, 2010-03-25 14:52 Patch, apply to 2.x
newssl.patch pitrou, 2010-04-06 10:46
newssl2.patch pitrou, 2010-04-06 11:39
newssl3.patch pitrou, 2010-04-06 13:17
newssl4.patch pitrou, 2010-04-08 18:24
newssl5.patch pitrou, 2010-04-08 20:15
python_ssl_v2.c.txt dlmiles, 2010-04-09 16:46 Revised newssl5.patch
Modules__ssl.c.patch dlmiles, 2010-04-11 16:47 Patch for Modules/_ssl.c (still need newssl5.patch for test_ftplib.py fixes)
newssl6.patch pitrou, 2010-04-16 19:32
Messages (66)
msg100782 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-10 13:44
There's such failure on trunk and py3k when openssl 0.9.8m is installed.
(Debian platform)
No failure with 0.9.8k.

test_ftplib
Exception in thread Thread-40:
Traceback (most recent call last):
  File "./Lib/threading.py", line 530, in __bootstrap_inner
    self.run()
  File "./Lib/test/test_ftplib.py", line 223, in run
    asyncore.loop(timeout=0.1, count=1)
  File "./Lib/asyncore.py", line 211, in loop
    poll_fun(timeout, map)
  File "./Lib/asyncore.py", line 154, in poll
    write(obj)
  File "./Lib/asyncore.py", line 88, in write
    obj.handle_error()
  File "./Lib/asyncore.py", line 84, in write
    obj.handle_write_event()
  File "./Lib/test/test_ftplib.py", line 290, in handle_write_event
    super(SSLConnection, self).handle_write_event()
  File "./Lib/asyncore.py", line 440, in handle_write_event
    self.handle_write()
  File "./Lib/asynchat.py", line 174, in handle_write
    self.initiate_send()
  File "./Lib/asynchat.py", line 215, in initiate_send
    self.handle_close()
  File "./Lib/test/test_ftplib.py", line 43, in handle_close
    self.close()
  File "./Lib/test/test_ftplib.py", line 316, in close
    self.socket.unwrap()
  File "./Lib/ssl.py", line 272, in unwrap
    s = self._sslobj.shutdown()
SSLError: [Errno 2] _ssl.c:1367: The operation did not complete (read)
msg100783 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-10 13:47
It occurs only for trunk and 3.x, where the "ftp over ssl" is implemented. See #2054.

And the Debian alpha buildbots have the same library 0.9.8m:
http://www.python.org/dev/buildbot/all/builders/alpha%20Debian%20trunk
http://www.python.org/dev/buildbot/all/builders/alpha%20Debian%203.x
msg100784 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-10 13:49
OpenSSL changelog (scroll to 0.9.8m): http://www.openssl.org/news/changelog.html
msg100888 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-03-11 21:39
The problem comes from SSL_Shutdown() on a non blocking connection. See:
http://www.mail-archive.com/openssl-dev@openssl.org/msg24097.html
msg100889 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-03-11 21:48
The patch changing SSL_Shutdown() behaviour on non blocking socket was written in 2006 by Darryl Miles and it will commited into 0.9.8m. The original patch:
http://marc.info/?l=openssl-dev&m=115153998821797&w=2

In our testcase, SSL_Shutdown() fails with SSL_ERROR_WANT_READ, which means (extract of the email): "Return -1 WANT_READ while we are still waiting for the inbound recv notify alert to be received."
msg100892 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-03-11 23:06
More informations:
http://www.openssl.org/docs/ssl/SSL_shutdown.html
http://rt.openssl.org/Ticket/Display.html?user=guest&pass=guest&id=1891

Extract of the manual page:

 
 If the underlying BIO is non-blocking, SSL_shutdown() will also return when the underlying BIO could not satisfy the needs of SSL_shutdown() to continue the handshake. In this case a call to SSL_get_error() with the return value of SSL_shutdown() will yield SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE. ***The calling process then must repeat the call after taking appropriate action to satisfy the needs of SSL_shutdown().*** The action depends on the underlying BIO. When using a non-blocking socket, nothing is to be done, but select() can be used to check for the required condition. When using a buffering BIO, like a BIO pair, data must be written into or retrieved out of the BIO before being able to continue.
msg100899 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-03-12 00:53
There are different kind of bugs, but all of them are related to SSL_Shutdown(). It depends on the socket timeout:
 - blocking socket: nothing special (no change from previous OpenSSL version)
 - timeout or non blocking socket: if SSL_shutdown() returns -1 and SSL_get_error()==SSL_ERROR_WANT_READ, we have to read bytes from the socket (to read the "close" notify acknowledge?)

I did't understood how much bytes have to be read, nor if we have to check the socket status using select() or not.

Attached patch is a draft to demonstrate how the bug have to be fixed, but I don't know OpenSSL nor asynchat enough to write a good fix. TODO: unwrap() returns None on SSL_shutdown() error and if the socket has been closed.
msg100958 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2010-03-12 19:23
Victor, I'll take a look.
msg100961 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2010-03-12 19:48
What's happening is that the new state returns from SSL_shutdown() are saying, "the shutdown you asked for didn't happen this time, but call me again when you get a chance.  And here is a hint about why it didn't happen, so that if you are sophisticated enough, you can get better information about whether subsequent calls to shutdown() will succeed."

Given the underlying architectures we are dealing with, is it possible to handle this properly?  Basically, what the SSL_shutdown code is saying is that all higher-level event loops should understand an additional kind of event, "shutting down the SSL socket", on a par with "read bytes" and "write bytes".  But our event processing handlers, like asyncore, basically only deal with read and write.  I don't see how to tell them that a "close" operation should turn into a "read" or "write".

We could also think about turning non-blocking sockets into blocking sockets, by adding a time.sleep(0.1) and re-calling SSL_shutdown().  But maybe the right thing to do would be for the write handler in the asyncore loop to catch this, and re-try the write when it gets a chance.
msg100962 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2010-03-12 20:05
I'm not sure this should be handled in ssl.py.
ssl.py correctly raises ERROR_WANT_READ/WRITE if the shutdown operation didn't complete and that's ok.
It should be up to the upper application (in our case asyncore) to deal with that.

Moreover, the patch tries to solve the issue by using a while loop which is *evil* for asynchronous environments (a similar problem is debated in issue 3890).

My suggestion is to modify the SSLConnection.close() method of the test server so that it just "return" in case of ERROR_WANT_READ/WRITE errors without calling the original asyncore's close method since the socket object must remain in the select loop as long as the connection is still around.

A subsequent call to handle_read() and/or handle_write() should not be necessary as they should automatically be called by asyncore on the next poll() loop but I'm not 100% sure about this. 

In both cases I'd say that the only thing needed to be fixed here is the test server.
msg100963 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2010-03-12 20:10
Looking at this further, what we're seeing is the trace of an ineffective attempt to handle an exception presumably raised from the FTP code.  Can we see that exception?  What's the actual state of the TCP connection at this point?
msg100964 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2010-03-12 20:18
I think Giampaolo is right about this not being an ssl.py issue.  Could the exception handling in ftp.py perhaps also be made more sophisticated?  I'd really love to understand what the state of the TCP connection is here.  I'm presuming that it's still open, because otherwise you'd get a different error from OpenSSL.  So what's the error that's triggering this call?  Is it simply that the sender has run out of stuff to send?

Also, I think that asyncore.py is stingy about what information it sends.  At the very least, it could invoke handle_error() with the exception it's handling as an argument.
msg100965 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2010-03-12 20:19
And it would be interesting to know why all the SSL module tests don't fail in the same way.
msg100977 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2010-03-12 22:01
> I'd really love to understand what the state of the TCP connection 
> is here. I'm presuming that it's still open, because otherwise you'd 
> get a different error from OpenSSL.  So what's the error that's 
> triggering this call?  Is it simply that the sender has run out of 
> stuff to send?

"almost closed"? :)
I don't know but I wouldn't label this as an "unexpected error" because when dealing with non blocking sockets errors like this one are perfectly legitimate as they "tell" the upper application what it is expected to do (for example EWOULDBLOCK on send() means "retry later").

A patch is in attachment but I have not been able to reproduce this issue on python 2.7 r78878 on Ubuntu 8.04 so it might not work (I suspect it might end up in an endless loop).

If this doesn't work we can try the second option of calling handle_read/write right after shutdown().
msg100980 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-03-12 23:24
I posted on openssl-users mailing list. The author of patch (change SSL_shutdown() behaviour for non blocking socket) replied!
http://marc.info/?l=openssl-users&m=126838806919896&w=2
msg100985 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-03-13 02:58
> I posted on openssl-users mailing list. The author of patch (change
> SSL_shutdown() behaviour for non blocking socket) replied!
> http://marc.info/?l=openssl-users&m=126838806919896&w=2

Interesting read, thank you.
AFAICT, his proposed "quick fix" snippet should be good enough for us.
Also, it seems the new "graceful shutdown" logic is really trying to
handle at the transport level things which should be handled by the
application protocol (detecting whether the other end still has things
to send).
msg101065 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2010-03-14 19:44
> AFAICT, his proposed "quick fix" snippet should be good enough for us.

Depends on what we want.  It just suppresses information that's now available.  What we'd really like is for the caller to recognize that close() can fail, and should be re-tried if it does.  That requires that we signal the error back up and out of the ssl module.  It seems to me that any non-blocking code should recognize this and respect it.

Again, why does this failure only show up in the FTP test?  Why aren't we seeing it elsewhere?
msg101066 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-03-14 20:02
> close() can fail

Not close, unwrap() (shutdown()) ;-)

> Again, why does this failure only show up in the FTP test?

Maybe because only the FTP test uses an SSL socket in non blocking mode.
msg101070 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-03-14 22:42
> Depends on what we want.  It just suppresses information that's now
> available.  What we'd really like is for the caller to recognize that
> close() can fail, and should be re-tried if it does.  That requires
> that we signal the error back up and out of the ssl module.  It seems
> to me that any non-blocking code should recognize this and respect it.

Well, first, it will break code which used to work perfectly well (as the failing tests show). That OpenSSL decided to break compatibility in what looks like a minor bugfix release (0.9.8m) doesn't mean we should propagate the breakage.

Second, I'm not an SSL expert, but it's not obvious to me that the application should have to ensure a "complete shutdown" before closing the socket. Often, stateful applicative protocols close their "session" anyway (*) before shutting down the transport layer, so I don't see what this additional precaution can buy (apart from code complication).

(*) for example, FTP has the "QUIT" command
msg101073 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2010-03-15 00:05
> Maybe because only the FTP test uses an SSL socket in non blocking mode.

No, the SSL unit tests also do this.  I think Giampaolo is right, what we're seeing is bad error handling in the FTP test code.
msg101074 - (view) Author: Bill Janssen (janssen) * (Python committer) Date: 2010-03-15 00:11
>> Depends on what we want.  It just suppresses information that's now
>> available.  What we'd really like is for the caller to recognize that
>> close() can fail, and should be re-tried if it does.  That requires
>> that we signal the error back up and out of the ssl module.  It seems
>> to me that any non-blocking code should recognize this and respect it.

> Well, first, it will break code which used to work perfectly well (as > the failing tests show). That OpenSSL decided to break compatibility
> in what looks like a minor bugfix release (0.9.8m) doesn't mean we
> should propagate the breakage.

I admit to being powerfully swayed by this argument.  Of course, you could also say that it exposes bugs in code which hasn't been tested enough.  Again, look at where this stack trace is coming from.  It's a default error handler in code that was written 10 years ago.  Perhaps that code needs more than just a default error handler; perhaps the handler in FTP_TLS should be more sophisticated.

> Second, I'm not an SSL expert, but it's not obvious to me that the
> application should have to ensure a "complete shutdown" before
> closing the socket. Often, stateful applicative protocols close
> their "session" anyway (*) before shutting down the transport layer,
> so I don't see what this additional precaution can buy
> (apart from code complication).

What it buys is the ability for careful code to properly shut down an SSL session when using non-blocking sockets.  If we apply the fix, we deny that, and I'm sure someone will immediately file a bug about it :-).
msg101517 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-03-22 15:58
Ok, I finally agree that test_ftplib should be patched.
The patch isn't exactly correct: it should ideally retry the unwrap() call later, rather than simply ignore the error. But since it's just used for testing, it looks sufficient.

(we probably lack a higher-level abstraction around ssl sockets, especially non-blocking ones; there's too many domain-specific stuff to know for the average programmer.)
msg101534 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2010-03-22 21:14
> The patch isn't exactly correct: it should ideally retry the unwrap() 
> call later, rather than simply ignore the error. But since it's just 
> used for testing, it looks sufficient.

Actually unwrap() should already be called.
What the patch does is returning in case of ERROR_WANT_READ/WRITE, and *explicitly avoids to close() the connection when such an event occurs*.

This should grant that another asyncore poll() loop is executed, handle_close() automatically be called again and so unwrap().

As I said in comment #100977 I'm not 100% sure of this since I've not been able to reproduce the issue. I'm just presuming this as I know that asyncore works like this with plain-text sockets.

I'd like to make sure this works as expected even if it's just a test.
If some of the folks able to reproduce this issue could confirm that unwrap() gets called again it would be great.
msg101654 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-03-24 22:09
Giampaolo, I tried committing your patch but the buildbot still fails, with the following errors:

test_auth_issued_twice (test.test_ftplib.TestTLS_FTPClass) ... ok
test_auth_ssl (test.test_ftplib.TestTLS_FTPClass) ... ok
test_control_connection (test.test_ftplib.TestTLS_FTPClass) ... ok
test_data_connection (test.test_ftplib.TestTLS_FTPClass) ... Exception in thread Thread-462:
Traceback (most recent call last):
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/threading.py", line 530, in __bootstrap_inner
    self.run()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/test/test_ftplib.py", line 223, in run
    asyncore.loop(timeout=0.1, count=1)
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/asyncore.py", line 211, in loop
    poll_fun(timeout, map)
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/asyncore.py", line 154, in poll
    write(obj)
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/asyncore.py", line 88, in write
    obj.handle_error()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/asyncore.py", line 84, in write
    obj.handle_write_event()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/test/test_ftplib.py", line 290, in handle_write_event
    super(SSLConnection, self).handle_write_event()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/asyncore.py", line 440, in handle_write_event
    self.handle_write()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/asynchat.py", line 174, in handle_write
    self.initiate_send()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/asynchat.py", line 215, in initiate_send
    self.handle_close()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/test/test_ftplib.py", line 43, in handle_close
    self.close()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/test/test_ftplib.py", line 323, in close
    self.socket.unwrap()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ssl.py", line 258, in unwrap
    s = self._sslobj.shutdown()
error: [Errno 32] Broken pipe

ERROR
test_login (test.test_ftplib.TestTLS_FTPClass) ... ok

======================================================================
ERROR: test_nlst (test.test_ftplib.TestTLS_FTPClassMixin)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/test/test_ftplib.py", line 515, in test_nlst
    self.assertEqual(self.client.nlst(), NLST_DATA.split('\r\n')[:-1])
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 495, in nlst
    self.retrlines(cmd, files.append)
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 695, in retrlines
    conn = self.transfercmd(cmd)
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 357, in transfercmd
    return self.ntransfercmd(cmd, rest)[0]
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 670, in ntransfercmd
    conn, size = FTP.ntransfercmd(self, cmd, rest)
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 324, in ntransfercmd
    host, port = self.makepasv()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 302, in makepasv
    host, port = parse227(self.sendcmd('PASV'))
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 786, in parse227
    raise error_reply, resp
error_reply: 226 transfer complete

======================================================================
ERROR: test_retrbinary_rest (test.test_ftplib.TestTLS_FTPClassMixin)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/test/test_ftplib.py", line 475, in test_retrbinary_rest
    self.client.retrbinary('retr', received.append, rest=rest)
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 678, in retrbinary
    conn = self.transfercmd(cmd, rest)
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 357, in transfercmd
    return self.ntransfercmd(cmd, rest)[0]
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 670, in ntransfercmd
    conn, size = FTP.ntransfercmd(self, cmd, rest)
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 324, in ntransfercmd
    host, port = self.makepasv()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 302, in makepasv
    host, port = parse227(self.sendcmd('PASV'))
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 786, in parse227
    raise error_reply, resp
error_reply: 200 type ok

======================================================================
ERROR: test_storbinary (test.test_ftplib.TestTLS_FTPClassMixin)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/test/test_ftplib.py", line 488, in test_storbinary
    self.client.storbinary('stor', f)
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 727, in storbinary
    conn.unwrap()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ssl.py", line 258, in unwrap
    s = self._sslobj.shutdown()
SSLError: [Errno 2] _ssl.c:1367: The operation did not complete (read)

======================================================================
ERROR: test_storbinary_rest (test.test_ftplib.TestTLS_FTPClassMixin)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/test/test_ftplib.py", line 500, in test_storbinary_rest
    self.client.storbinary('stor', f, rest=r)
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 727, in storbinary
    conn.unwrap()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ssl.py", line 258, in unwrap
    s = self._sslobj.shutdown()
SSLError: [Errno 2] _ssl.c:1367: The operation did not complete (read)

======================================================================
ERROR: test_storlines (test.test_ftplib.TestTLS_FTPClassMixin)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/test/test_ftplib.py", line 505, in test_storlines
    self.client.storlines('stor', f)
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 746, in storlines
    conn.unwrap()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ssl.py", line 258, in unwrap
    s = self._sslobj.shutdown()
SSLError: [Errno 2] _ssl.c:1367: The operation did not complete (read)

======================================================================
ERROR: test_data_connection (test.test_ftplib.TestTLS_FTPClass)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/test/test_ftplib.py", line 619, in test_data_connection
    self.client.prot_c()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 663, in prot_c
    resp = self.voidcmd('PROT C')
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 249, in voidcmd
    return self.voidresp()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 224, in voidresp
    resp = self.getresp()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 210, in getresp
    resp = self.getmultiline()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 196, in getmultiline
    line = self.getline()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/ftplib.py", line 183, in getline
    line = self.file.readline()
  File "/home/doko/buildarea/trunk.klose-debian-alpha/build/Lib/socket.py", line 445, in readline
    data = self._sock.recv(self._rbufsize)
timeout: timed out
msg101661 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-25 00:28
This patch solves the problem with 0.9.8m on my Debian workstation.

Actually there's 4 kind of errors on shutdown():
 - SSL_ERROR_WANT_READ
 - SSL_ERROR_WANT_WRITE
 - socket.error: errno.PIPE (32)
 - socket.error: 0

Since the command is "shutdown()", the patch silence all the socket errors, because they are probably due to the remote side closing the connection abruptly.
See the details on the Darryl's analysis:
http://bugs.python.org/issue8108#msg100980

This patch is very closed to Darryl's proposal.
msg101677 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2010-03-25 08:31
@Antoine Pitrou: ok, I think I'll have to be able to replicate the error in order to try to fix this. I'm gonna try on FreeBSD and another Linux box later today.
msg101678 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-25 08:32
I don't know if it is better to fix it in the ftplib or the ssl module.

The previous patch fixes it at the SSL level (stdlib only).
This patch implements the fix at the ftplib level (stdlib + test).
msg101694 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-03-25 13:30
Florent, you shouldn't need to silence socket.error at this level.

I think what needs to happen is that, when unwrap() fails in SSLConnection.close(), handle_read() and handle_write() retry the unwrap() call. That's what I gather from the explanation of the new behaviour.
msg101699 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-25 14:11
You're right, there's no need to catch the socket error in "ftplib" module.

Here is the result of the "debugging" patch in the console:

$ ./python -m test.regrtest -uall test_ftplib
test_ftplib
success
success
success
success
success
success
success
success
2702 ssl want read
2801 ssl want read
2661 ssl want read
2274 ssl want read
16742 ssl want read
2397 ssl want read
1 test OK.



IMHO, there's no need to loop on "unwrap()", according to Darryl's analysis.
msg101704 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2010-03-25 14:50
Florent can you give me a clue on how to make python use OpenSSL 0.9.8k on Ubuntu?
Despite I compiled and installed 0.9.8k from sources the system version keeps being 0.9.8g and I guess that Python is using that one.
Do I have to recompile Python by using a directive to tell what specific OpenSSL version to use?
msg101705 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-25 14:52
Patch ftplib/test_ftplib updated.
It is as before (0.9.8k): SSL_ERROR_WANT_READ is ignored on connection unwrap().
msg101706 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-03-25 14:57
Giampaolo, IMHO you need 0.9.8m to reproduce the issue.

I installed it from the Debian testing repo (I'm on debian).
~ $ apt-cache policy openssl
openssl:
  Installed : 0.9.8m-2
  Candidate : 0.9.8m-2
 Table de version :
 *** 0.9.8m-2 0
        400 ftp://ftp.lip6.fr testing/main Packages
        300 ftp://ftp.lip6.fr sid/main Packages
        100 /var/lib/dpkg/status
     0.9.8g-15+lenny6 0
        900 ftp://ftp.lip6.fr lenny/main Packages
        900 http://security.debian.org lenny/updates/main Packages

Maybe you could catch it on "Lucid" repositories, and if you have some chance, it may be compatible with your ubuntu version.
msg101729 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2010-03-25 19:33
Ok, I've finally been able to reproduce the issue by installing OpenSSL 0.9.8m on my Ubuntu box.
By doing some debugging I'm starting to think that maybe there's something wrong with the shutdown() method implementation itself because no matter what, sooner or later it always ends up raising "error: [Errno 0] Error".

I've cheated by trapping shutdown() in a while loop waiting for it to succeed, I've tried to manually call handle_read() and handle_write(), I've also tried to set the socket back to blocking mode just to see if shutdown() would return but I've always ended up encountering "error: [Errno 0] Error".

IMHO, before going any further I'd investigate on why this call can just fail without providing any information about what happened.
From the upper application's point of view there's nothing it can do in case of such an error.
Assuming that "the remote side closed the connection abruptly" and close the socket is just wrong.
msg102292 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-03 20:15
I don't think this should be a release blocker for the beta, although I agree it should be one for the final release (and RCs).
By the way, it doesn't affect maintenance branches.
msg102293 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-03 20:16
Giampaolo, which call exactly triggers the "error: [Errno 0] Error"?
It doesn't seem it can be _sslobj.shutdown itself.
msg102394 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2010-04-05 18:25
Yes, it's the _sslobj.shutdow() call:

  File "test_ftplib.py", line 332, in handle_close
    self.socket = self.socket.unwrap()
  File "/usr/local/lib/python2.7/ssl.py", line 258, in unwrap
    s = self._sslobj.shutdown()
error: [Errno 0] Error

Since it's not clear to me where exactly this comes from, whether it's from the Python C binding or OpenSSL itself, I tried to put some debugging printf() calls in Modules/_ssl.c, but it seems that  after installing OpenSSL 0.9.8m I'm no longer able to compile Python 2.7 from sorces.
This is what I get when I run ./configure; make:

gcc -pthread -fPIC -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -IInclude -I./Include -I/usr/local/include -I/home/giampaolo/svn/python-2.7/Include -I/home/giampaolo/svn/python-2.7 -c /home/giampaolo/svn/python-2.7/Modules/_ssl.c -o build/temp.linux-i686-2.7/home/giampaolo/svn/python-2.7/Modules/_ssl.o
gcc -pthread -shared build/temp.linux-i686-2.7/home/giampaolo/svn/python-2.7/Modules/_ssl.o -L/usr/local/lib -lssl -lcrypto -o build/lib.linux-i686-2.7/_ssl.so
*** WARNING: renaming "_ssl" since importing it failed: build/lib.linux-i686-2.7/_ssl.so: undefined symbol: inflate
msg102395 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-05 18:30
> Since it's not clear to me where exactly this comes from, whether it's
> from the Python C binding or OpenSSL itself, I tried to put some
> debugging printf() calls in Modules/_ssl.c, but it seems that  after
> installing OpenSSL 0.9.8m I'm no longer able to compile Python 2.7
> from sorces.

Have you tried `make distclean`?
msg102396 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2010-04-05 18:37
No I haven't, but I tried just now and I get the same error.
msg102398 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-05 19:28
> No I haven't, but I tried just now and I get the same error.

I think "inflate" is a function exported by the zlib. Perhaps you can
add "-lz" to the linking flags.
(Googling hints that OpenSSL can depend on the zlib if compression is
enabled)
msg102407 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2010-04-05 20:43
As suggested in this thread:
http://mirt.net/pipermail/stunnel-users/2005-July/000661.html
...I made the following change to the Makefile:

- LIBS=       -lpthread -ldl  -lutil
+ LIBS=       -lpthread -ldl  -lutil -lz

That made the error change as follows:

running build_ext

INFO: Can't locate Tcl/Tk libs and/or headers

Python build finished, but the necessary bits to build these modules were not found:
_bsddb             _curses            _curses_panel   
_sqlite3           _tkinter           bsddb185        
bz2                dbm                gdbm            
readline           sunaudiodev
msg102410 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-05 20:46
I've tried to build against OpenSSL 1.0.0, and I get further failures in test_ssl. Since I don't know whether they are related, I've created a separate issue for them: issue8322
msg102449 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-06 10:45
Ok, here is a patch fixing all test_ftplib failures (tested with openssl 0.9.8k and 1.0.0).
There are a couple of fixes here:
- a workaround for the strange SSL_shutdown() error return, also witnessed by LightHTTPd users (see URL in comments)
- an auto-retry for SSLSocket.unwrap() in blocking mode
- a bizarre issue in test_ftplib where DTPHandler.handle_close() gets called many times in a test (test_data_connection)

I think the SSL_shutdown() issue should be checked on the OpenSSL mailing-list.
msg102450 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2010-04-06 11:05
def handle_error(self):
-            raise
+            # Ignore errors while closing, because the remote end could have
+            # abruptly shut down the TCP connection while we are still
+            # waiting for SSL shutdown to finish.
+            if not self._ssl_closing:
+                raise


I don't like this.
I should have said that I think my original patch is wrong as it attempts to shutdown() the SSL layer in asyncore's close() method.
I think that I should have done that in handle_close() instead of close() because the two methods are called in different circumstances and have different meanings:

- handle_close() is supposed to be called when the connection is a about to be closed "in a clean way" (for example: the other end properly finished to transmit all the data).

- close() may be called in different situations, also when the connection gets closed abruptly. In this case we should avoid to shutdown() the SSL layer as we might be dealing with a "dead" connection. The right thing to do in this case is just close() the socket without doing anything else.

IMHO, what we should try to avoid to do here is silently ignoring any error condition as you did in handle_error() method.
What happens if you move the SSL shutdown() call in handle_close() method and leave handle_error() unchanged?
msg102451 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-06 11:11
> What happens if you move the SSL shutdown() call in handle_close()
> method and leave handle_error() unchanged?

I get occasional crashes such as the following:

Exception in thread Thread-60:
Traceback (most recent call last):
  File "/home/antoine/cpython/newssl/Lib/threading.py", line 530, in __bootstrap_inner
    self.run()
  File "/home/antoine/cpython/newssl/Lib/test/test_ftplib.py", line 229, in run
    asyncore.loop(timeout=0.1, count=1)
  File "/home/antoine/cpython/newssl/Lib/asyncore.py", line 211, in loop
    poll_fun(timeout, map)
  File "/home/antoine/cpython/newssl/Lib/asyncore.py", line 154, in poll
    write(obj)
  File "/home/antoine/cpython/newssl/Lib/asyncore.py", line 88, in write
    obj.handle_error()
  File "/home/antoine/cpython/newssl/Lib/asyncore.py", line 84, in write
    obj.handle_write_event()
  File "/home/antoine/cpython/newssl/Lib/test/test_ftplib.py", line 313, in handle_write_event
    super(SSLConnection, self).handle_write_event()
  File "/home/antoine/cpython/newssl/Lib/asyncore.py", line 440, in handle_write_event
    self.handle_write()
  File "/home/antoine/cpython/newssl/Lib/asynchat.py", line 174, in handle_write
    self.initiate_send()
  File "/home/antoine/cpython/newssl/Lib/asynchat.py", line 215, in initiate_send
    self.handle_close()
  File "/home/antoine/cpython/newssl/Lib/test/test_ftplib.py", line 352, in handle_close
    self._do_ssl_shutdown()
  File "/home/antoine/cpython/newssl/Lib/test/test_ftplib.py", line 290, in _do_ssl_shutdown
    self.socket.unwrap()
  File "/home/antoine/cpython/newssl/Lib/ssl.py", line 264, in unwrap
    s = self._sslobj.shutdown()
error: [Errno 32] Broken pipe
msg102452 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-06 11:15
By the way, this "broken pipe" error could be due to the fact that the
FTP_TLS client never tries to call unwrap() on its SSL socket. Perhaps
the close() method should be overriden?
msg102455 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-06 11:39
Here is a new patch. It doesn't change handle_error() but instead ignores abruptly closed sockets when doing the SSL shutdown.

The underlying problem seems to be an asyncore limitation: close() can be called after an error such as EPIPE or ECONNABORTED, but asyncore doesn't give us any way to know in advance.
msg102456 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2010-04-06 11:43
> By the way, this "broken pipe" error could be due to the fact that the
> FTP_TLS client never tries to call unwrap() on its SSL socket. Perhaps
> the close() method should be overriden?

ftplib.FTP_TLS class already calls unwrap() but only when closing a "secured" *data* connection.
This is never done for the *control* connection as the examples shown in RFC-4217 do that only when dealing with the CCC command which is intended to switch the control connection back to clear text.
Since ftplib.py does not implement the CCC command I would avoid to override its close() method.

As for our test server we are currently shutting down the SSL layer for both connections (control and data) so this might be the cause of EPIPE.
At this proposal I suggest to move the shutdown() call from SSLConnection.handle_close() to DummyTLS_DTPHandler.handle_close() and see what happens.
If EPIPE keeps being raised then I would investigate on what EPIPE means in terms of OpenSSL itself. It might mean that the socket is supposed to be closed in which case it's perfectly legitimate.

I'd have a few questions:

> a workaround for the strange SSL_shutdown() error return, also
> witnessed by LightHTTPd users (see URL in comments)

What happens now? I mean... what do we get instead of socket.error: 0?

> an auto-retry for SSLSocket.unwrap() in blocking mode

Wasn't SSL_MODE_AUTO_RETRY option applied in issue 8222 supposed to already do that for us?

> I think the SSL_shutdown() issue should be checked on the OpenSSL 
> mailing-list.

If socket.error: 0 comes from OpenSSL then yes. I was suspecting something like this.
msg102457 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-06 11:54
> As for our test server we are currently shutting down the SSL layer for
> both connections (control and data) so this might be the cause of EPIPE.

I don't understand your reasoning. These are separate connections, closing one shouldn't affect the other.

> At this proposal I suggest to move the shutdown() call from
> SSLConnection.handle_close() to DummyTLS_DTPHandler.handle_close() and 
> see what happens.

What happens is that you are taking shortcuts with SSL since you aren't trying to do a proper shutdown :)
If you really think this is a good idea, feel free to give it a try.
But please see newssl2.patch as well.

(of course this is only a test server but AFAIU you would like to make it right)

> What happens now? I mean... what do we get instead of socket.error: 0?

A normal successful return.

> Wasn't SSL_MODE_AUTO_RETRY option applied in issue 8222 supposed to 
> already do that for us?

SSL_MODE_AUTO_RETRY deals with SSL negotiation. SSL_shutdown() has its own separate semantics, which have changed *even in blocking mode*. If I disable auto-retry I get failures such as:

======================================================================
ERROR: test_storbinary (test.test_ftplib.TestTLS_FTPClassMixin)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/antoine/cpython/newssl/Lib/test/test_ftplib.py", line 505, in test_storbinary
    self.client.storbinary('stor', f)
  File "/home/antoine/cpython/newssl/Lib/ftplib.py", line 727, in storbinary
    conn.unwrap()
  File "/home/antoine/cpython/newssl/Lib/ssl.py", line 264, in unwrap
    s = self._sslobj.shutdown()
SSLError: [Errno 2] _ssl.c:1374: The operation did not complete (read)

(note how "the operation did not complete" seems to mean that the socket is non-blocking, but the error is happening with the regular blocking FTP client)
msg102459 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2010-04-06 12:23
> I don't understand your reasoning. These are separate connections, 
> closing one shouldn't affect the other.

I meant another thing. I was talking about the fact that the test server attempts to shutdown() both control and data connections while the client only shutdown() the latter one. Maybe that's why you get EPIPE.

> If you really think this is a good idea, feel free to give it a try.

I whish but I'm having serious setup problems problems because I attempted to install different OpenSSL versions and now I'm no longer able to compile Python.
Anyway, I'll give another try later today. =)

> But please see newssl2.patch as well.

- ftplib.FTP_TLS.quit: I wouldn't call unwrap() here as RFC-4217 does that only when dealing with CCC command which is not implemented by ftplib.py.

- I'd be for calling _do_ssl_shutdown() from handle_close() instead of from close() (reasons explained in my previous message)

- If EPIPE is still raised I'd try to move the shutdown() code from SSLConnection.handle_close() to DummyTLS_DTPHandler.handle_close().
msg102462 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-06 13:10
> - ftplib.FTP_TLS.quit: I wouldn't call unwrap() here as RFC-4217 does
> that only when dealing with CCC command which is not implemented by
> ftplib.py.

Ok.

> - I'd be for calling _do_ssl_shutdown() from handle_close() instead of
> from close() (reasons explained in my previous message)

This provokes test failures such as:

======================================================================
ERROR: test_dir (test.test_ftplib.TestTLS_FTPClassMixin)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/antoine/cpython/newssl/Lib/test/test_ftplib.py", line 537, in test_dir
    self.client.dir(lambda x: l.append(x))
  File "/home/antoine/cpython/newssl/Lib/ftplib.py", line 511, in dir
    self.retrlines(cmd, func)
  File "/home/antoine/cpython/newssl/Lib/ftplib.py", line 714, in retrlines
    return self.voidresp()
  File "/home/antoine/cpython/newssl/Lib/ftplib.py", line 224, in voidresp
    resp = self.getresp()
  File "/home/antoine/cpython/newssl/Lib/ftplib.py", line 210, in getresp
    resp = self.getmultiline()
  File "/home/antoine/cpython/newssl/Lib/ftplib.py", line 196, in getmultiline
    line = self.getline()
  File "/home/antoine/cpython/newssl/Lib/ftplib.py", line 183, in getline
    line = self.file.readline()
  File "/home/antoine/cpython/newssl/Lib/socket.py", line 445, in readline
    data = self._sock.recv(self._rbufsize)
  File "/home/antoine/cpython/newssl/Lib/ssl.py", line 98, in <lambda>
    self.recv = lambda buflen=1024, flags=0: SSLSocket.recv(self, buflen, flags)
  File "/home/antoine/cpython/newssl/Lib/ssl.py", line 217, in recv
    return self.read(buflen)
  File "/home/antoine/cpython/newssl/Lib/ssl.py", line 138, in read
    return self._sslobj.read(len)
SSLError: The read operation timed out

I have already tested quite a bunch of possibilities (handle_close or
close, silencing EPIPE or not...)

> - If EPIPE is still raised I'd try to move the shutdown() code from
> SSLConnection.handle_close() to DummyTLS_DTPHandler.handle_close().

This only makes the server less respectful of the protocol...
I think it is fine to silence EPIPE and friends during SSL shutdown
since, as I've said, asyncore will call handle_close() when the remote
ends has closed the TCP connection.
msg102464 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-06 13:17
New patch without the quit() method change.
msg102639 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-08 18:24
Ok, I raised the "errno 0" issue on the openssl mailing-list, and I had a nice response by Darryl Miles here:
http://www.mail-archive.com/openssl-users@openssl.org/msg60710.html

The bottom line is that any "socket error" return from unwrap(), that is any "SSL_ERROR_SYSCALL" return from SSL_shutdown(), means that the socket is closed (abruptly or not). In this case, we can't do anything except ignore the error in test_ftplib.

We should also remove the special casing hack in _ssl.c, even though a socket error with "errno 0" isn't particularly informative.

Here is a new patch.
msg102641 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-08 20:15
My analysis of SSL_shutdown() was missing something: timeout sockets are really non-blocking sockets in disguise. I guess it's never too late to notice.

OpenSSL doesn't know about the timeout, so we have to do it ourselves. Ironically, the infrastructure is already in place in _ssl.c, and used for read() and write() methods.

So here is a new patch, taking this into account.
msg102689 - (view) Author: Darryl Miles (dlmiles) Date: 2010-04-09 06:54
In order to build Python with a specific version of OpenSSL followed the CYGWIN instructions and edited Modules/Setup to make it read (note - I added "-L$(SSL)" into the linker options too, since by default on CentOS 5.4 i386 OpenSSL build in static library mode ala ../openssl-1.0.0/libssl.a) :

SSL=../openssl-1.0.0
_ssl _ssl.c \
        -DUSE_SSL -I$(SSL)/include -I$(SSL)/include/openssl \
        -L$(SSL)/lib -L$(SSL) -lssl -lcrypto


It is not clear to me what Python's goals are:
 * To be backward compatible, in which case I don't know your historical use of SSL_shutdown().
 * To be a thin-layer (1:1) over OpenSSL, so that "power users" can harness the full potential of OpenSSL if they are willing to understand the finer points.
 * To provide a full-featured Python API.
 * To provide a Python API that is easy to use within the Python paradigm.

These goals may not be convergent.
msg102731 - (view) Author: Darryl Miles (dlmiles) Date: 2010-04-09 16:46
I've updated my attachment to the bug, if you read the old one please re-read the updated version (since some points in there were not accurate).

With regards to the OpenSSL error return -1/ERROR_SYSCALL with errno==0 being observed, I shall respond into the OpenSSL mailing list with a fuller response.  The man page SSL_get_error(3) does explain what getting a zero error means in relation to end-of-file at the BIO/TCP socket level.  In light of the presumption by me that the problem was because one end did a syscall close(fd) this makes perfect sense in the context of your observation and OpenSSL appears to be working as documented.  There is also code to print out the error in Python at Modules/_ssl.c method PySSL_SetError() so I'm not sure of the source of the funny looking error printing in relation to the ftpcli test case, consider it to be an error message formatting glitch.


Now the issue I see here is that there are clearly 3 use cases Python should provide:

 * one-shot raw mode (don't enter the loop at all, as per newssl5.patch/my attachment, this is more or less what you already have in CVS, but I would remove the 2nd call to SSL_shutdown(), raw mode means exactly that; the caller is in charge of calling it again, thin layer for Python power users) [case-1]

 * perform IO sleep/wait as necessary until we observe SSL_shutdown()==0 (or better!  so this will return if 0 or 1 is returned) [case-2]

 * perform IO sleep/wait as necessary until we observe SSL_shutdown()==1 [case-3]

I presume you already have a way of handling the configuration of I/O timeouts as per Python's IO processing model (that is provided by some other API mechanism).



The question is what is the best way to provide them (what is inline with the Python paradigm?) :

 * one method, keep existing named method, add new optional argument that can indicate all 3 modes of operation.  Debate which of the 3 modes of operation is the default when the argument is not specified, case-1 seems to most backwardly compatible.  [I am presuming python supports optional arguments]

 * new method, keep existing as-is (to cover case 1), implement case-2 and case-3 in the method which also take an argument for the user to specify which use case they want.


From this a patch should be straight-forward.  Then we can look to see if the FTP client or server is doing anything wrong in light of having the building blocks in place to achieve any goal on top of OpenSSL.
msg102852 - (view) Author: Darryl Miles (dlmiles) Date: 2010-04-11 16:47
I am unable to get "make test" to run from an unpatched version in SVN (details below of make output).

Please find attached an updated patch for your consideration (and testing, as I can't test it due to 'make test' SIGSEGV on CentOS 5.4 i386).


Patch Notes:

1) Some thing that concern me, the "unwrap()" philosophy looks to be used to remove SSL from the Python high-level socket handle, so you can go back to plaintext mode.  You can ONLY perform an unwrap() AFTER an SSL_shutdown()==1 has been observed (you need to wait for the other end to do something voluntarily).

So you must retry the SSL_shutdown() over and over while you sleep-wait for IO, so this is akin to calling the ssl.shutdown(ssl.SSL_SHUTDOWN_MODE_BOTH) and getting back success.

Also if it is your intention to properly implement an unwrap() like this you should disable IO read-ahead mode before calling shutdown for the second time, SSL_set_read_ahead(ssl, 0).  This stops OpenSSL from eating too many bytes accidentally (probably from the kernel into its own buffers), from the inbound IO stream, which may not be SSL protocol data, it maybe plain text data (behind the last byte of SSL protocol data).


2) Due to the IO waiting it looks also necessary to copy the setup of SSL_set_nbio() from the read/write paths so the check_socket_and_wait_for_timeout() works in sympathy to the callers IO timeout reconfiguration.


3) My patch presumes the allocation of the type "struct PySSLObject" uses calloc() or some other memory zeroing strategy.  There is a new member in that struct to track if SSL_shutdown() has previously returned a zero value.


4) The SSL_peek() error path needs checking to see if the error return is consistent with the Python paradigm.


5) Please check I have understand the VARARGS method correctly.  I have made the default to SSL_SHUTDOWN_MODE_SENT (despite backward compatibly being SSL_SHUTDOWN_MODE_ONCE), this is because I would guess that most high-level applications did not intend to use it in raw mode; nor be  bothered with the issues surrounding correct usage.  I would guess high-level applications wanted Python to take the strain here.


6) I suspect you need to address your unwrap() policy a little better, the shutdown operation and the unwrap() are two different matters.  The shutdown() should indicate success or not (in respect of the mode being requested, raw mode is a tricky one as the caller would want to the exact error return so it can do the correct thing), unwrap() should itself call ssl.shutdown(ssl.SSL_SHUTDOWN_MODE_BOTH) until it sees success and then remove the socket (and deallocate SSL objects).

As things stand SSL_SHUTDOWN_MODE_ONCE does not work in a useful way since the error returns are not propagated to the caller, because "unwrap" is mixed into this.  So that would still need fixing.



building works ok, testing fails with SIGSEGV.  Is this something to do with no having _bsddb built ?  I have db-4.3 working.  Maybe someone can reply by email on the matter.

# make
running build
running build_ext
building dbm using gdbm

Python build finished, but the necessary bits to build these modules were not found:
bsddb185           sunaudiodev                        
To find the necessary bits, look in setup.py in detect_modules() for the module's name.

running build_scripts




# make test
running build
running build_ext
building dbm using gdbm

Python build finished, but the necessary bits to build these modules were not found:
bsddb185           sunaudiodev                        
To find the necessary bits, look in setup.py in detect_modules() for the module's name.

running build_scripts
find ./Lib -name '*.py[co]' -print | xargs rm -f
./python -Wd -3 -E -tt ./Lib/test/regrtest.py -l 
== CPython 2.7a4+ (trunk:79902M, Apr 11 2010, 16:38:55) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)]
==   Linux-2.6.18-164.15.1.el5-i686-with-redhat-5.4-Final
==   /root/python-svn/build/test_python_29248
test_grammar
test_opcodes
test_dict
test_builtin
test_exceptions
test_types
test_unittest
test_doctest
test_doctest2
test_MimeWriter
test_SimpleHTTPServer
test_StringIO
test___all__
/root/python-svn/Lib/test/test___all__.py:10: DeprecationWarning: in 3.x, the bsddb module has been removed; please use the pybsddb project instead
  import bsddb
/root/python-svn/Lib/bsddb/__init__.py:67: PendingDeprecationWarning: The CObject type is marked Pending Deprecation in Python 2.7.  Please use capsule objects instead.
  import _bsddb
make: *** [test] Segmentation fault
msg102857 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-11 17:45
To just run the test_ftplib tests, you can use:

$ ./python -m test.regrtest -v -uall test_ftplib

(Other tests involving SSL sockets are test_ssl, test_smtpnet,
test_imaplib and test_poplib)

> 1) Some thing that concern me, the "unwrap()" philosophy looks to be
> used to remove SSL from the Python high-level socket handle, so you
> can go back to plaintext mode.  You can ONLY perform an unwrap() AFTER
> an SSL_shutdown()==1 has been observed (you need to wait for the other
> end to do something voluntarily).

When the SSL shutdown fails, an exception is raised, which means the
rest of the unwrapping (at the Python "high-level" socket level) doesn't
occur.
Therefore, it is safe to call unwrap() again from user code because the
SSL object is still there.

> Also if it is your intention to properly implement an unwrap() like
> this you should disable IO read-ahead mode before calling shutdown for
> the second time, SSL_set_read_ahead(ssl, 0).  This stops OpenSSL from
> eating too many bytes accidentally (probably from the kernel into its
> own buffers), from the inbound IO stream, which may not be SSL
> protocol data, it maybe plain text data (behind the last byte of SSL
> protocol data).

Do you know how to cook a simple test to exercise this?

> 2) Due to the IO waiting it looks also necessary to copy the setup of
> SSL_set_nbio() from the read/write paths so the
> check_socket_and_wait_for_timeout() works in sympathy to the callers
> IO timeout reconfiguration.

Thanks for spotting this.

> 5) Please check I have understand the VARARGS method correctly.  I
> have made the default to SSL_SHUTDOWN_MODE_SENT (despite backward
> compatibly being SSL_SHUTDOWN_MODE_ONCE), this is because I would
> guess that most high-level applications did not intend to use it in
> raw mode; nor be  bothered with the issues surrounding correct usage.
> I would guess high-level applications wanted Python to take the strain
> here.

Yes, sounds right indeed. I'm not sure we need a choice of shutdown
modes at all.

> building works ok, testing fails with SIGSEGV.  Is this something to
> do with no having _bsddb built ?  I have db-4.3 working.  Maybe
> someone can reply by email on the matter.

_bsddb seems to be built, it's the old bsddb185 which isn't.
The module apparently breaks when importing it, can you open a separate
issue for it?

I'd like Bill Janssen's opinion on these proposed changes. Bill, can you
take a look?
msg102867 - (view) Author: Darryl Miles (dlmiles) Date: 2010-04-11 18:54
To explain why you need 2 modes, a client/server would expect to do the following pseudo actions for maximum efficiency:

set_socket_timeout(600_SECONDS)  # or useful default
send_data_over_ssl("QUIT\r\n")
shutdown(SSL_SHUTDOWN_MODE_SENT)
flush_data_down_to_socket()   # maybe automatic/implied (OpenSSL users with custom BIO layers should be aware of this step)
shutdown(socket, SHUT_WR)   # this is optional, TCP socket level shutdown
recv_data_over_ssl() = "250 Bye bye!\r\n"  # this will take time to arrive
set_socket_io_timeout(5_SECONDS)
shutdown(SSL_SHUTDOWN_MODE_BOTH)  # this is optional!  some clients may choose to skip it entirely
close()/unwrap()


A server would:

recv_data_over_ssl() = "QUIT\r\n"  # would be sitting idle waiting for this command
send_data_over_ssl("250 Bye bye!\r\n")
shutdown(SSL_SHUTDOWN_MODE_SENT)
flush_data_down_to_socket()   # maybe automatic/implied (OpenSSL users with custom BIO layers should be aware of this step)
shutdown(socket, SHUT_WR)   # this is optional, TCP socket level shutdown
set_socket_io_timeout(30_SECONDS)
shutdown(SSL_SHUTDOWN_MODE_BOTH)  # a good server would implement this step
close()/unwrap()


Now if your outbound data is CORKed and flushed, the flush points would cause all the SSL data from both the 'last sent data' and the 'send shutdown notify' to go out in the same TCP segment and arrive at the other end more or less together.

Doing any of the above in a different order introduces some kind of inefficiency.  shutdown(fd, SHUT_WR) are often used at the socket level to help the manage TIME_WAIT.

The client has to wait for the QUIT response message anyway.  With the above sequence there is no additional time delay or cost with both parties performing a SSL protocol shutdown at the same time.  Despite the IO timeouts existing (to provide a safety net).

If the client is talking to a buggy server the worst case scenario is that it receives the quit response but the server never does an SSL shutdown and the server doesn't close the socket connection.  In this situation the client will have to wait for IO timeout, some clients in other software use blocking sockets and don't have a timeout so they end up hooked (forever).
msg103354 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-16 19:32
After thinking a bit about it, we're in quasi-feature freeze now for 2.x and it doesn't sound reasonable to add options to unwrap(); we would need to write tests to exercise these options and their behaviour. Also, they aren't strictly needed to fix the issues at hand, but provide a functional benefit.

So I've taken some of Darryl's ideas and fixes and incorporated them. Here is a new patch with them. I've checked compatibility with OpenSSL 1.0.0 and 0.9.8k. If nobody manifests him/herself I will commit it soon.

(note: I dropped the idea of checking for application data with SSL_peek(), since it seems to be a blocking call; at least one of the test_ssl test cases froze here)
msg103412 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2010-04-17 17:27
If everyone agrees on "error: [Errno 0] Error" being a legitimate alias for a "connection closed event" condition then I'd say the test server looks good, even if I think that expecting a ssl.SSLError derived exception would have made more sense, but I understand this is a OpenSSL inherited behavior.

Anyway, I'd like to improve the SSL test server in future as I'm sure it hides some other problems but I can do that once pyftpdlib's SSL server code is more mature and after 2.7 is released.
msg103418 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-17 18:11
> If everyone agrees on "error: [Errno 0] Error" being a legitimate alias 
> for a "connection closed event" condition then I'd say the test server 
> looks good, even if I think that expecting a ssl.SSLError derived
> exception would have made more sense, but I understand this is a
> OpenSSL inherited behavior.

Well, it's quite an unfortunate error display, so perhaps we could detect it and turn into something more meaningful instead.
However, I think it should remain a socket error or an OS error, because it's really the underlying socket which is closed, not the SSL wrapper.

> Anyway, I'd like to improve the SSL test server in future as I'm sure > it hides some other problems but I can do that once pyftpdlib's SSL
> server code is more mature and after 2.7 is released.

Shouldn't it become part of asyncore at some point? It's both useful (well, for people using asyncore anyway :-)) and non-trivial.
msg103450 - (view) Author: Darryl Miles (dlmiles) Date: 2010-04-17 23:18
With regards to create test cases for certain situations, sure this would be possible but not with "pure python" since your APIs deny/inhibit the particular things required to force a situation for a test case.

With regards to SSL_peek() blocking, you'd need to explain yourself better on that one.  The patch has been tested with the test cases from Python SVN enough to be happy they run ok.  Maybe you have some offline not yet checked in SSL test cases you are referring to.  To clarify why this is being done, if there is unread data then SSL_shutdown() will never return 1.  Maybe you can simulate this situation by using SSL_write() with 1 byte payloads and a 10ms delay between each SSL_write() of the "QUIT response message" (you are trying to simulate network propagation delay).  Then you have a client that tries to do unwrap() right after having sent the quit command, but makes no attempt to receive the response.

I'll leave you guys too it about how you want to handle things with python (i.e. to make the design choice trade offs).  I think all the points have been covered.
msg103604 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-19 14:26
> With regards to SSL_peek() blocking, you'd need to explain yourself
> better on that one.  The patch has been tested with the test cases
> from Python SVN enough to be happy they run ok.

IIRC it worked with OpenSSL 1.0.0 but failed with 0.9.8k.
I could test it again if you think that is unlikely.

Can SSL_pending() be used in place of SSL_peek() in this case?
(note: there doesn't seem to be any documentation for SSL_peek() on the
openssl.org website)

> I'll leave you guys too it about how you want to handle things with
> python (i.e. to make the design choice trade offs).  I think all the
> points have been covered.

Thanks a lot for your help. I believe it has been fruitful.
msg103987 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-23 00:18
Patch committed to r80392 (trunk) and r80394 (py3k). Am pondering a partial backport of the SSL shutdown fixes to 2.6 and 3.1 (test_ftplib doesn't need any backport since it doesn't support TLS in these branches).
msg104053 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-23 21:13
Ok, I've backported the shutdown fixes to 2.6 (r80419) and 3.1 (r80420). Thanks again everyone.
msg151688 - (view) Author: Maxim Egorushkin (max0x7ba) Date: 2012-01-20 12:13
I am still having this issue with Python-2.7.2 (which includes the patch) and openssl-devel-1.0.0e-1.fc14.x86_64:

  File "./download.py", line 195, in download_file
    ftp.retrbinary("retr " + filename, lambda data: file.write(data))
  File "/usr/local/ots/python/lib/python2.7/ftplib.py", line 691, in retrbinary
    conn.unwrap()
  File "/usr/local/ots/python/lib/python2.7/ssl.py", line 275, in unwrap
    s = self._sslobj.shutdown()
socket.error: [Errno 0] Error
History
Date User Action Args
2022-04-11 14:56:58adminsetgithub: 52355
2012-01-20 12:13:45max0x7basetnosy: + max0x7ba
messages: + msg151688
2010-04-23 21:13:53pitrousetstatus: pending -> closed

messages: + msg104053
versions: + Python 2.6, Python 3.1
2010-04-23 00:18:29pitrousetstatus: open -> pending
messages: + msg103987

assignee: pitrou
resolution: fixed
stage: patch review -> resolved
2010-04-19 14:26:22pitrousetmessages: + msg103604
2010-04-17 23:18:16dlmilessetmessages: + msg103450
2010-04-17 18:11:27pitrousetmessages: + msg103418
2010-04-17 17:27:52giampaolo.rodolasetmessages: + msg103412
2010-04-16 19:32:05pitrousetfiles: + newssl6.patch

messages: + msg103354
2010-04-11 18:54:10dlmilessetmessages: + msg102867
2010-04-11 17:45:34pitrousetmessages: + msg102857
2010-04-11 16:47:42dlmilessetfiles: + Modules__ssl.c.patch

messages: + msg102852
2010-04-09 16:46:20dlmilessetfiles: - python_ssl.c.txt
2010-04-09 16:46:10dlmilessetfiles: + python_ssl_v2.c.txt

messages: + msg102731
2010-04-09 06:54:12dlmilessetfiles: + python_ssl.c.txt
nosy: + dlmiles
messages: + msg102689

2010-04-08 20:15:10pitrousetfiles: + newssl5.patch

messages: + msg102641
2010-04-08 18:24:44pitrousetfiles: + newssl4.patch

messages: + msg102639
2010-04-06 13:17:02pitrousetfiles: + newssl3.patch

messages: + msg102464
2010-04-06 13:10:55pitrousetmessages: + msg102462
2010-04-06 12:23:13giampaolo.rodolasetmessages: + msg102459
2010-04-06 11:54:30pitrousetmessages: + msg102457
versions: - Python 2.6, Python 3.1
2010-04-06 11:43:09giampaolo.rodolasetmessages: + msg102456
2010-04-06 11:39:26pitrousetfiles: + newssl2.patch

messages: + msg102455
2010-04-06 11:15:09pitrousetmessages: + msg102452
2010-04-06 11:11:15pitrousetmessages: + msg102451
2010-04-06 11:05:35giampaolo.rodolasetmessages: + msg102450
2010-04-06 10:46:05pitrousetfiles: + newssl.patch
2010-04-06 10:45:42pitrousetmessages: + msg102449
2010-04-05 20:46:52pitrousetmessages: + msg102410
2010-04-05 20:43:16giampaolo.rodolasetmessages: + msg102407
2010-04-05 19:28:16pitrousetmessages: + msg102398
2010-04-05 18:37:04giampaolo.rodolasetmessages: + msg102396
2010-04-05 18:30:54pitrousetmessages: + msg102395
2010-04-05 18:28:47floxsetnosy: + jcea
2010-04-05 18:25:38giampaolo.rodolasetnosy: - jcea

messages: + msg102394
versions: + Python 2.6, Python 3.1
2010-04-05 13:33:15jceasetnosy: + jcea
2010-04-03 20:44:34benjamin.petersonsetpriority: release blocker -> critical
2010-04-03 20:16:41pitrousetmessages: + msg102293
2010-04-03 20:15:16pitrousetmessages: + msg102292
versions: - Python 2.6, Python 3.1
2010-04-03 16:41:58benjamin.petersonsetpriority: high -> release blocker
2010-03-25 19:33:19giampaolo.rodolasetmessages: + msg101729
2010-03-25 14:57:11floxsetmessages: + msg101706
2010-03-25 14:52:59floxsetfiles: + issue8108_ftplib_ssl_098m_v2.diff

messages: + msg101705
2010-03-25 14:50:52giampaolo.rodolasetmessages: + msg101704
2010-03-25 14:49:06floxsetfiles: - issue8108_ftplib_ssl_098m.diff
2010-03-25 14:11:58floxsetfiles: + issue8108_print_ssl_want_read.diff

messages: + msg101699
2010-03-25 13:30:12pitrousetmessages: + msg101694
2010-03-25 08:32:48floxsetfiles: + issue8108_ftplib_ssl_098m.diff

messages: + msg101678
2010-03-25 08:31:40giampaolo.rodolasetmessages: + msg101677
2010-03-25 00:28:44floxsetfiles: + issue8108_openssl_098m.diff

messages: + msg101661
stage: needs patch -> patch review
2010-03-24 22:16:13pitrousetstage: test needed -> needs patch
versions: + Python 2.6, Python 3.1
2010-03-24 22:09:26pitrousetmessages: + msg101654
2010-03-22 21:14:33giampaolo.rodolasetmessages: + msg101534
2010-03-22 15:58:30pitrousetmessages: + msg101517
2010-03-15 00:11:28janssensetmessages: + msg101074
2010-03-15 00:05:22janssensetmessages: + msg101073
2010-03-14 22:42:26pitrousetmessages: + msg101070
2010-03-14 20:02:49vstinnersetmessages: + msg101066
2010-03-14 19:44:49janssensetmessages: + msg101065
2010-03-14 17:26:42Arfreversetnosy: + Arfrever
2010-03-13 02:58:16pitrousetmessages: + msg100985
2010-03-12 23:24:49vstinnersetmessages: + msg100980
2010-03-12 22:01:42giampaolo.rodolasetfiles: + testftpd.patch

messages: + msg100977
2010-03-12 20:19:54janssensetmessages: + msg100965
2010-03-12 20:18:01janssensetmessages: + msg100964
2010-03-12 20:10:40janssensetmessages: + msg100963
2010-03-12 20:05:24giampaolo.rodolasetmessages: + msg100962
2010-03-12 19:48:36janssensetmessages: + msg100961
2010-03-12 19:23:55janssensetmessages: + msg100958
2010-03-12 00:53:35vstinnersetfiles: + ssl_shutdown.patch
keywords: + patch
messages: + msg100899
2010-03-11 23:09:57pitrousetnosy: + janssen
2010-03-11 23:06:25vstinnersetmessages: + msg100892
2010-03-11 21:48:34vstinnersetmessages: + msg100889
2010-03-11 21:39:28vstinnersetmessages: + msg100888
2010-03-10 13:49:02floxsetmessages: + msg100784
2010-03-10 13:47:34floxsetkeywords: + buildbot

messages: + msg100783
2010-03-10 13:45:11floxsetnosy: + pitrou, vstinner, giampaolo.rodola
2010-03-10 13:44:37floxcreate