classification
Title: Asyncio server hang when clients connect and immediately disconnect
Type: security Stage: resolved
Components: asyncio Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: yselivanov Nosy List: christian.heimes, gvanrossum, j1m, socketpair, vstinner, yselivanov
Priority: normal Keywords:

Created on 2016-06-25 20:56 by j1m, last changed 2016-10-06 18:07 by yselivanov. This issue is now closed.

Files
File name Uploaded Description Edit
echo.py j1m, 2016-06-29 15:20
echo-no-print.py j1m, 2016-06-29 15:25
echo2.py j1m, 2016-06-29 16:21
Messages (27)
msg269256 - (view) Author: Jim Fulton (j1m) * (Python committer) Date: 2016-06-25 20:56
I recently ported ZEO to asyncio.

We'd had a bug in our old asyncore-based server where the server would hang if several connections were made and then immediately disconnected on Mac OS X.  This was due to an error-handling bug in our code that we fixed.  We have a regression test for this case.

The regression test for this case fails using asyncio.Server.

I've attached a (ZEO-independent) script that demonstrates the problem.

If you run the script with Python 3.4 or 3.5, I expect the script will hang. It does for me on Mac OS X 10.10.5 and Ubuntu 14.04.
msg269302 - (view) Author: Jim Fulton (j1m) * (Python committer) Date: 2016-06-26 17:46
FWIW, using uvloop avoids the hang.
msg269308 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-06-26 20:09
Please reduce program, and make sure it still hangs.
msg269313 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-06-26 21:37
Yeah, I'd like to see a more minimal repro to understand what's going in.
msg269486 - (view) Author: Jim Fulton (j1m) * (Python committer) Date: 2016-06-29 13:30
This is already pretty minimal. There are no external dependencies.
msg269490 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-06-29 13:36
Plese reduce even more. I mean remove debugging, specifi commands, and all extra code, that is not related to original problem.
msg269517 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-06-29 15:17
Also I recommend you to use asyncio streams, instead of reinventing wheels.

So, reading your command will look like:

data = await stream.read_exactly(4)
(len,) = unpack(">I", data)
command = await stream.read_exactly(len)
msg269518 - (view) Author: Jim Fulton (j1m) * (Python committer) Date: 2016-06-29 15:20
OK, I *was* able to simplify it a fair bit. I'm uploading a new version.  I left prints in because I think you'd find them helpful, but I'll upload another version without prints.
msg269520 - (view) Author: Марк Коренберг (socketpair) * Date: 2016-06-29 15:21
One more thing. Why you set socket.SO_LINGER ? and why lingering timeout is 0 seconds ?

Removing that eliminate problem completely.
msg269523 - (view) Author: Jim Fulton (j1m) * (Python committer) Date: 2016-06-29 15:25
Here's a version sans prints
msg269526 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-06-29 15:47
I can't personally run that code and get the results you are getting; could you please walk us through what happens (as far as you can tell)? Reading the code I find myself quite confused about which parts of the code might be active or not. E.g. is self.messages used? Does its actual contents matter? Where does it end up?
msg269527 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-06-29 16:10
Jim, I think you wanted to post this link in this issue: https://bugs.launchpad.net/zodb/+bug/135108/comments/9 instead of in #27392.

I can reproduce this on my mac, but so far I've no idea what's going on.
msg269528 - (view) Author: Jim Fulton (j1m) * (Python committer) Date: 2016-06-29 16:21
Guido, are you saying that the script runs without hanging for you?
(If you get a boatload of tracebacks, that's due to another asyncio bug in error handling.)

Are you running the version with prints?

This is an adaptation of the echo server and client from the docs.

The server runs in a thread.  It just echos it's input.

The client just waits for a message from the server, and then send messages (one in attached echo2.py) and waits for replies. When I run this on Mac and ubuntu 14.04, the server never sees the messages sent by the client.

I'm uploading a newer version that simplifies the messages data structure and adds some prints to, I think, make the sequence easier to see.

Fixing the bug that causes all the tracebacks to be printed would also make this easier to interpret.

Commenting out the code that makes and closes the socket connections with SO_LINGER and running echo2.py should also make it easy to see the trivial expected client/server interaction.

I don't think the details of the interaction between the server and the client are very important, other than the fact that the client gets the first message from the server and the server doesn't get the subsequent message from the client.
msg269530 - (view) Author: Jim Fulton (j1m) * (Python committer) Date: 2016-06-29 16:23
Yuri, right you are. Thanks.

Марк, see https://bugs.launchpad.net/zodb/+bug/135108/comments/9
msg269532 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-06-29 17:04
Running out of time to debug this today.  I think this is a bug in CPython, in either socket or select module.  When I inject some debug code in selectors.py and replace KQueue with select(), I can see that the server thread's selector stops working at some point due to a EBADF error.

I think something similar is happening with the KQueue selector -- at some point it just stops to return events correctly.

Again, I might be wrong about this all, but this is what I think after 2.5 hours of debugging.
msg269533 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-06-29 17:16
No, I just don't have a computer right now, only a phone.

--Guido (mobile)
msg269534 - (view) Author: Jim Fulton (j1m) * (Python committer) Date: 2016-06-29 17:25
WRT CPython/sockets this problem doesn't happen if I use asyncore to accept connections and hand them off to create_connection. :)

It also doesn't occur with uvloop, which I assume still uses sockets.

Also, FWIW, the relevant ZEO test passes if I use SSL, which is how I'm working around this now for the tests.
msg269535 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-06-29 18:09
> It also doesn't occur with uvloop, which I assume still uses sockets.

No, uvloop doesn't use python sockets or select for IO at all.  All IO is done in libuv.

> WRT CPython/sockets this problem doesn't happen if I use asyncore to accept connections and hand them off to create_connection. :)

Interesting.
msg276642 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-09-15 23:46
It looks like this was fixed by #27759!.  Jim, could you please verify?
msg276711 - (view) Author: Jim Fulton (j1m) * (Python committer) Date: 2016-09-16 13:13
Cool, I will verify soon.
msg276802 - (view) Author: Jim Fulton (j1m) * (Python committer) Date: 2016-09-17 16:33
Yes, that change addresses this issue. Thanks!

Will this be backported?
msg277450 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-09-26 18:57
Yuri, are you going to backport the fix to 3.4?
msg277458 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-09-26 20:15
Isn't 3.4 in security fixes only mode?
msg277460 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-09-26 20:19
Jim ask for a backport. In case the problem is not a security issue that needs to be backported, feel free to close the ticket.
msg277463 - (view) Author: Jim Fulton (j1m) * (Python committer) Date: 2016-09-26 20:22
This is arguably a security issue because it's a DoS vector.

I don't feel strongly about it though.
msg277465 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-09-26 20:26
> Jim ask for a backport.

Sorry Jim, was replying from my email client, didn't see all messages.

> This is arguably a security issue because it's a DoS vector.

Yeah, I can see why.  I can commit this to 3.4 in a week.  Christian, feel free to commit this if you want this issue to be closed earlier.
msg278200 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-10-06 18:07
Alright, I've backported the fix to 3.4. Closing this.
History
Date User Action Args
2016-10-06 18:07:07yselivanovsetstatus: open -> closed

messages: + msg278200
stage: resolved
2016-09-26 20:26:54yselivanovsetmessages: + msg277465
2016-09-26 20:22:25j1msetmessages: + msg277463
2016-09-26 20:19:07christian.heimessetmessages: + msg277460
2016-09-26 20:15:58yselivanovsetstatus: pending -> open

messages: + msg277458
2016-09-26 18:57:00christian.heimessetstatus: open -> pending

nosy: + christian.heimes
versions: - Python 3.5
messages: + msg277450

assignee: yselivanov
2016-09-17 16:33:19j1msetresolution: fixed
messages: + msg276802
2016-09-16 13:13:05j1msetmessages: + msg276711
2016-09-15 23:46:41yselivanovsetmessages: + msg276642
2016-07-06 15:29:31j1msettype: behavior -> security
2016-06-29 18:09:07yselivanovsetmessages: + msg269535
2016-06-29 17:25:47j1msetmessages: + msg269534
2016-06-29 17:16:31gvanrossumsetmessages: + msg269533
2016-06-29 17:04:47yselivanovsetmessages: + msg269532
2016-06-29 16:23:52j1msetmessages: + msg269530
2016-06-29 16:21:29j1msetfiles: + echo2.py

messages: + msg269528
2016-06-29 16:10:03yselivanovsetmessages: + msg269527
2016-06-29 15:47:29gvanrossumsetmessages: + msg269526
2016-06-29 15:25:29j1msetfiles: + echo-no-print.py

messages: + msg269523
2016-06-29 15:21:09socketpairsetmessages: + msg269520
2016-06-29 15:20:23j1msetfiles: - echo.py
2016-06-29 15:20:08j1msetfiles: + echo.py

messages: + msg269518
2016-06-29 15:17:40socketpairsetmessages: + msg269517
2016-06-29 13:36:38socketpairsetmessages: + msg269490
2016-06-29 13:30:35j1msetmessages: + msg269486
2016-06-26 21:37:56gvanrossumsetmessages: + msg269313
2016-06-26 20:09:35socketpairsetnosy: + socketpair
messages: + msg269308
2016-06-26 17:46:10j1msetmessages: + msg269302
2016-06-25 20:56:34j1mcreate