classification
Title: asyncio missing wrap_socket (starttls)
Type: enhancement Stage: needs patch
Components: asyncio Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: yselivanov Nosy List: Alex Grönholm, Elizacat, Frzk, alex.gronholm, asvetlov, barry, christian.heimes, gc, msornay, python-dev, siemer, sphawk, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2015-03-23 13:59 by gc, last changed 2017-09-07 20:46 by christian.heimes.

Files
File name Uploaded Description Edit
tls1.patch yselivanov, 2015-10-26 20:19 review
Messages (30)
msg239021 - (view) Author: Giovanni Cannata (gc) Date: 2015-03-23 13:59
It's not possible to wrap a socket in tls. The StreamWriter object should have an option to start a tls negotiation using the SSLContext of the server.

This is needed for protocols the have a "start_tls" feature, for example the ldap protocol.

In a non async program it's very easy: 
   wrapped_socket = ssl_context.wrap_socket(connection.socket, server_side=True, do_handshake_on_connect=True)

there should be something similar in the StreamWriter interface:
   yield from writer.wrap_socket()

Bye,
Giovanni
msg239022 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-23 14:01
Yes, it's not supported yet. It was already requested in this issue:
https://code.google.com/p/tulip/issues/detail?id=79

asyncio got a new SSL implementation which makes possible to implement STARTTLS. Are you interested to implement it?
msg239026 - (view) Author: Giovanni Cannata (gc) Date: 2015-03-23 14:20
Thanks, I will look to the new implementation of ssl in 3.5, and try to adapt it for my project (sldap3). I'd like to help, but I'm not skilled in asynchronous programming so I'm not sure if I succeed.

Bye,
Giovanni
msg242095 - (view) Author: Elizabeth Myers (Elizacat) * Date: 2015-04-27 03:59
What needs to be done to make this happen? I can try to implement it.
msg242097 - (view) Author: Elizabeth Myers (Elizacat) * Date: 2015-04-27 04:08
For what it's worth, IRC has an optional STARTTLS extension which is implemented by some servers. IMAP and SMTP also have STARTTLS as a fundamental component of their protocols. As gc pointed out, LDAP also supports it.

IMO this is a pretty glaring omission.
msg242136 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-04-27 17:43
We didn't do this originally because the 3.4 SSL module didn't have this functionality (or maybe it was 3.3 that didn't have this) but now that we do have it I'd be very happy if you could implement this!

I'm not sure what the right interface is, probably coroutine methods on StreamReader and StreamWriter that take an SSLContext object (and from then on the reader/writer is using SSL), but there also would have to be a lower-level way to do the same thing to a Transport. This would probably have to return a new Transport that uses the original, wrapped transport for reading/writing.

You probably should write a small test app that proves this works for real too. Perhaps start with a synchronous test app that uses the existing wrap_socket() and then work on the async interface until you can reproduce the same thing there.

Let us know if you need more information.
msg242155 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-04-27 20:10
So you need to:

- have an API to wrap a clear-text protocol in a SSL protocol (see e.g. BaseProactorEventLoop._make_ssl_transport()... note how there's a waiter argument, what should be done with that?)

- be able to replace a protocol with another on the transport (perhaps using a new, optional Transport API?)

- perhaps a higher-level API that combines the two

Also for convenience a protocol should probably be able to inspect whether it has *already* been wrapped.
msg242193 - (view) Author: Elizabeth Myers (Elizacat) * Date: 2015-04-28 18:01
It seems pretty simple to just make a function that returns a new transport, something like "transport = yield from loop.ssl_wrap_transport(transport)". I'm not sure how to handle plaintext data left on the wire, though, unless that's not really a consideration (given most (all?) real-world protocols can (and usually do) wait for the SSL handshake before sending more data when STARTTLS has been requested).

For the higher-level API, I'm thinking "reader, writer = asyncio.ssl_wrap(reader, writer)" maybe? You can't have half-closed SSL connections, so you would have to pass them both in. 

As for replacing the protocol but keeping the transport, what would be the semantics of that? I can't really think of how to do that one. I do know SMTP clears all state, but some protocols might not (IRC is a key example - this isn't usually a problem since you are supposed to negotiate it early on before you log onto the server), so this shouldn't be mandatory.
msg242194 - (view) Author: Elizabeth Myers (Elizacat) * Date: 2015-04-28 18:21
Reading the source now (just woke up, sorry!), the new protocol thing makes sense. I'm not sure what to do with the waiter argument or how to handle that.

What I'm really trying to think of here is how to handle copying of state. I guess users will just have to do it by hand if they want to do that? I don't know if keeping the same protocol is practical and just wrapping the transport is a good idea :(.
msg242195 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-04-28 18:26
> As for replacing the protocol but keeping the transport, what would
> be the semantics of that?

The protocol is not really replaced, it's wrapped.

Before:

  SocketTransport <- UserProtocol

After:

  SocketTransport <- (asyncio.sslproto.SSLProtocol
     <- asyncio.sslproto._SSLProtocolTransport) <- UserProtocol

That way, the same SocketTransport (but it could be something else, e.g. a pipe transport) is always bound to the event loop; we simply insert a processing layer in the chain between the original transport and the final protocol.  There are two distinct objects so that the SocketTransport sees a protocol and the UserProtocol sees a transport; but those two objects work hand in hand.
msg242197 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-04-28 18:29
That sounds like a good plan for the top-level APIs.

But I think we may need to think about low-level APIs that handle Transports and Protocols as well.

The design I had been thinking of does not do any socket-level manipulation (in fact it doesn't care if the Transport has a socket or some other way to connect to a networking peer) but it does require some cooperation of the Transport and Protocol.

Let me draw some ASCII art.

In the initial state (no ssl) we have an App embodied in a Protocol, talking to a Transport which abstracts away a network connection:

                       data_received()
                    [ <--------------- ]
  App <==> Protocol [                  ] Transport <==> Network
                    [ ---------------> ]
                           write()

(I.e., when the app wants to write to the network, it calls transport.write(); when the network has data for the app, it calls protocol.data_received().)

In the final state (ssl established) I was thinking the picture would look like this (leaving the method names out):


                   [ <---- ]                                    [ <---- ]
  App <=> Protocol [       ] HelperTransport <=> HelperProtocol [       ] Transport <=> Network
                   [ ----> ]                                    [ ----> ]

Here the Protocol at the far left and the Transport at the far right are the same objects as in the first diagram; however we've inserted a pair of helpers that handle SSL. Once the SSL connection is flowing, a write() by the app gives the data to the helper, which gives it to the SSL library. When the SSL library wants to send some (encrypted etc.) data to the network it calls write() on the rightmost Transport (the original one, which still owns the network connection). Conversely, when data arrives over the network, it is still given to the rightmost Transport via its data_received() method. This Transport then gives it to the SSL library, which eventually decrypts it (etc.) and gives it to the helper, which then calls data_received() with the decrypted plaintext on the leftmost Protocol (i.e. the App).

The clever trick here is that the Protocol on the left still talks to a Transport, it's just a different Transport (owned by the helpr); similarly, the Transport on the right still talks to a Protocol, it's just a different one owned by the helper.

People have proposed general stacks of Protocol/Transport pairs, but so far I haven't seen much of a use case for that. This might be that use case. In order to switch the arrangements, the helper code (which is ultimately invoked by your loop.ssl_wrap_transport() call) must first create the HelperTransport and HelperProtocol halves of the SSL helper layer, and then call set_transport()/set_protocol() on the existing App Protocol and Network Transport to change their respective associated transport and protocol.

Note that the relationship between HelperTransport and HelperProtocol is not the same as that between associated Transport/Protocol pairs; the interface they use to talk to each other is completely internal to the implementation of the helper (and probably determined mostly by the needs of the underlying SSL BIO interface).

Hope this helps (and hope others on the issue agree).

--Guido
msg242198 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-04-28 18:30
> Reading the source now (just woke up, sorry!), the new protocol thing makes sense

Good :-)

> I'm not sure what to do with the waiter argument or how to handle that.

I'm not sure. Apparently it's used for create_connection(), so perhaps it's not necessary here? Perhaps Victor or Guido can give some insight...

> What I'm really trying to think of here is how to handle copying of state

Well, there shouldn't be any copying necessary. The user just continues using the same protocol instance; it just calls a different transport.
msg242199 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-04-28 18:36
Looks like Antoine drew the same diagram but quicker. :-)

Regarding the waiter arg, you can leave that None if you don't need it. It is intended to give the caller a way to block (using event loop machinery) until the connection is ready. But if your caller doesn't need it that's fine.
msg245532 - (view) Author: Elizabeth Myers (Elizacat) * Date: 2015-06-19 22:48
After giving this a look over, I think this is over my head. Sorry.
msg248048 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2015-08-05 15:45
I'm working on porting pypostgresql (pure python postgresql driver) library to use asyncio as its underlying IO machinery.  And it appears that PQ3 protocol starts as clear text, and then upgrades to use TLS (if server or connection configured so).

I've been experimenting with various approaches to how we can design an API for this, and below are some of my thoughts:

1. The API cannot be implemented on event loop. Loops generally know nothing about the internal structure of transports, i.e. what loop or protocol the transport is attached to.

2. The API cannot be implemented on protocols. Protocols are decoupled from transports (they only receive a reference to the corresponding transport in their connection_made method). Access to the transport is requires to create an SSL proxy transport/protocol pair.

3. Therefore, the most convenient place to add the new API are *transports*. I propose to add a 'start_ssl' method to transports with the following signature:

        def start_ssl(self, sslcontext=None,
                      server_side=False, server_hostname=None) -> Transport:

It will only be implemented on Python 3.5 (because of SSL MemoryBIO requirement).

Protocols can call 'start_ssl' any time after 'connection_made' is called. 'start_ssl' returns a new Transport (ssl proxy) that has to be used from that moment on.  In case the SSL handshake fails, protocol's 'connection_lost' method will be called.
msg248051 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-08-05 17:15
Why does the start_tls() function need to know the internal structure of the Transport? I'm hesitant to add this API to Transport; it somehow feels wrong to put such an implementation-specific thing there. E.g. I presume you can't do this for an UDP transport. Or perhaps it could be an API on a subclass of Transport -- then only members of that subclass will support this API.
msg248053 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2015-08-05 17:38
> Why does the start_tls() function need to know the internal structure of the Transport? 

If start_tls() is added to the loop, it will (likely) have the following signature:

    loop.start_tls(transport)

then I'd want it to check if the `transport` is on the same event loop, and after that we'll need to patch 'transport._protocol' with an `SSLProtocol` wrapper.  This requires adding 'get_loop()', 'get_protocol()' and 'set_protocol()' methods to transports (the latter one is kind of useless for anything but 'start_tls').

We can't implement 'loop.start_tls(protocol)', because protocols don't store a reference to their transports.

> I'm hesitant to add this API to Transport; it somehow feels wrong to put such an implementation-specific thing there. E.g. I presume you can't do this for an UDP transport. Or perhaps it could be an API on a subclass of Transport -- then only members of that subclass will support this API.

We can add a special subclass of Transport -- TLSTransport (that's how it's done in Twisted, btw: http://goo.gl/iAziWY) with 'start_tls' method raising 'NotImplementedError'.  We can later inherit _SelectorSocketTransport and _ProactorSocketTransport classes from it, implementing the method in Python 3.5.
msg248054 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-08-05 17:49
OK, got it. SGTM.
msg252513 - (view) Author: Elizabeth Myers (Elizacat) * Date: 2015-10-08 05:43
> Therefore, the most convenient place to add the new API are *transports*.

I had an inkling this was the case, but I didn't know how to go about the creation of a new protocol and transport pair.

> I'm hesitant to add this API to Transport; it somehow feels wrong to put such an implementation-specific thing there. E.g. I presume you can't do this for an UDP transport.

DTLS (basically TLS over any datagram-oriented protocol, including UDP, SCTP, etc.) exists, so this makes sense, although I don't know if asyncio supports it, but the only major protocol I can think of that uses DTLS is WebRTC.

In any case, it could potentially make sense for other transport types, if not now, then in the future.
msg253495 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2015-10-26 20:19
Guido, Victor,

Please find attached a first draft of the patch.  It's a very early attempt (i.e. I'm not including unit tests/docstrings), and its primary purpose is to gather initial feedback.

Some points:

1. As discussed earlier, the primary API point is new transports.TLSTransport class with a `start_tls(sslcontetx, *, server_side=False, server_hostname=None)` method.

2. While experimenting with the code and unit tests, I thought that it would be great if stream writers could do start_tls too, this patch has that too.  I like this new idea -- makes it so much simpler to write protocols.
msg253496 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2015-10-26 20:22
Here's an example client implementation with writer.start_tls() (taken from my debug code):

        @asyncio.coroutine
        def client(addr):
            reader, writer = yield from asyncio.open_connection(
                *addr, loop=loop)

            print("CLIENT: ", (yield from reader.readexactly(4)))
            writer.write(b'ehlo')
            yield from writer.start_tls(sslctx)
            # encrypted channel from this point
            print("CLIENT: ", (yield from reader.readexactly(4)))
msg253888 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2015-11-02 00:46
Guido, Victor, any thoughts about the (proto-)patch?
msg265494 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-05-13 21:44
I'll create a PR on the GitHub for this.  I like the proposed design, and I've implemented an SSL test micro-framework that we can use to test starttls in asyncio.
msg269417 - (view) Author: Sewon Park (sphawk) Date: 2016-06-28 05:41
https://bugs.python.org/review/23749/#msg1
yuri, did you saw guido added review on your patch?
msg269439 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-06-28 15:49
> yuri, did you saw guido added review on your patch?

Yes.  There are few more issues with the patch that I want to resolve before re-submitting it for another review.  Will do it soon.
msg276790 - (view) Author: Alex Grönholm (Alex Grönholm) Date: 2016-09-17 12:46
So is this going to make it into 3.6...?
msg278161 - (view) Author: Roundup Robot (python-dev) Date: 2016-10-05 23:40
New changeset 3771a6326725 by Yury Selivanov in branch '3.5':
asyncio: Add "call_connection_made" arg to SSLProtocol.__init__
https://hg.python.org/cpython/rev/3771a6326725

New changeset 3e6739e5c2d0 by Yury Selivanov in branch '3.6':
Merge 3.5 (issue #23749)
https://hg.python.org/cpython/rev/3e6739e5c2d0

New changeset f2204eaba685 by Yury Selivanov in branch 'default':
Merge 3.6 (issue #23749)
https://hg.python.org/cpython/rev/f2204eaba685
msg278162 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-10-05 23:42
With the latest change it's possible to implement starttls
as a separate package on PyPI, or even by copying/pasting a small
snipped of code in your project.

It's expected that we'll figure out the API design for starttls
during 3.6, so that we can add it in 3.7.

This issue should be kept open until we have a full public API
for starttls in asyncio.
msg293912 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2017-05-18 06:29
I'm very interested in this because, even though we do support STARTTLS in aiosmtpd, it's a hack using non-public symbols, and we have a hidden traceback!  (I.e. one that doesn't cause the test suite to fail, but only shows up when clients disconnect.)

Here's our STARTTLS implementation (at least as of this writing): 

https://github.com/aio-libs/aiosmtpd/blob/master/aiosmtpd/smtp.py#L361

And here's the bug description:

https://github.com/aio-libs/aiosmtpd/issues/83

We're getting eof_received() *after* connection_lost()!

And the "fix":

https://github.com/aio-libs/aiosmtpd/pull/101/files

Basically, once we flip the protocol to the SSLProtocol and then munge the transport, we have to keep the original transport around so that we can close that explicitly on connection_lost().

I don't really know whether this is 1) the right way to implement STARTTLS, and 2) to handle the traceback fix given the APIs we have to work with today (Python 3.4-3.6).  But that's the problem right? :)
msg301628 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-09-07 20:46
I'm removing myself and drop the SSL component. It's really a feature request for asyncio.
History
Date User Action Args
2017-09-07 20:46:50christian.heimessetassignee: christian.heimes -> yselivanov
messages: + msg301628
components: - SSL
2017-05-18 06:29:41barrysetmessages: + msg293912
2017-05-17 23:25:18barrysetnosy: + barry
2016-10-24 15:06:14gvanrossumsetnosy: - gvanrossum
2016-10-24 14:01:18Frzksetnosy: + Frzk
2016-10-05 23:42:00yselivanovsetmessages: + msg278162
versions: - Python 3.5, Python 3.6
2016-10-05 23:40:47python-devsetnosy: + python-dev
messages: + msg278161
2016-09-17 12:46:10Alex Grönholmsetnosy: + Alex Grönholm
messages: + msg276790
2016-09-15 08:07:27christian.heimessetassignee: christian.heimes

nosy: + christian.heimes
components: + SSL
versions: + Python 3.6, Python 3.7, - Python 3.4
2016-06-28 15:49:24yselivanovsetmessages: + msg269439
2016-06-28 05:41:23sphawksetnosy: + sphawk
messages: + msg269417
2016-05-13 21:44:19yselivanovsetmessages: + msg265494
2016-03-21 22:37:18siemersetnosy: + siemer
2015-11-02 00:46:42yselivanovsetmessages: + msg253888
2015-10-26 20:22:13yselivanovsetmessages: + msg253496
2015-10-26 20:19:21yselivanovsetfiles: + tls1.patch
keywords: + patch
messages: + msg253495
2015-10-08 05:43:54Elizacatsetnosy: + Elizacat
messages: + msg252513
2015-10-01 12:15:11msornaysetnosy: + msornay
2015-08-14 23:13:46pitrousetnosy: - pitrou
2015-08-12 18:25:45alex.gronholmsetnosy: + alex.gronholm
2015-08-05 17:49:56gvanrossumsetmessages: + msg248054
2015-08-05 17:38:37yselivanovsetmessages: + msg248053
2015-08-05 17:15:53gvanrossumsetmessages: + msg248051
2015-08-05 15:45:48yselivanovsetmessages: + msg248048
2015-07-28 15:09:04Elizacatsetnosy: - Elizacat
2015-06-23 21:13:47vstinnersettitle: asyncio missing wrap_socket -> asyncio missing wrap_socket (starttls)
2015-06-19 22:48:31Elizacatsetmessages: + msg245532
2015-04-28 18:36:55gvanrossumsetmessages: + msg242199
2015-04-28 18:30:45pitrousetmessages: + msg242198
2015-04-28 18:29:25gvanrossumsetmessages: + msg242197
2015-04-28 18:26:09pitrousetmessages: + msg242195
2015-04-28 18:21:43Elizacatsetmessages: + msg242194
2015-04-28 18:01:28Elizacatsetmessages: + msg242193
2015-04-28 14:34:19asvetlovsetnosy: + asvetlov
stage: needs patch

versions: + Python 3.5
2015-04-27 20:10:08pitrousetnosy: + pitrou
messages: + msg242155
2015-04-27 17:43:20gvanrossumsetmessages: + msg242136
2015-04-27 04:08:16Elizacatsetmessages: + msg242097
2015-04-27 03:59:11Elizacatsetnosy: + Elizacat
messages: + msg242095
2015-03-23 14:20:26gcsetmessages: + msg239026
2015-03-23 14:01:06vstinnersetmessages: + msg239022
2015-03-23 13:59:59gccreate