msg269256 - (view) |
Author: Jim Fulton (j1m) * |
Date: 2016-06-25 20:56 |
I recently ported ZEO to asyncio.
We'd had a bug in our old asyncore-based server where the server would hang if several connections were made and then immediately disconnected on Mac OS X. This was due to an error-handling bug in our code that we fixed. We have a regression test for this case.
The regression test for this case fails using asyncio.Server.
I've attached a (ZEO-independent) script that demonstrates the problem.
If you run the script with Python 3.4 or 3.5, I expect the script will hang. It does for me on Mac OS X 10.10.5 and Ubuntu 14.04.
|
msg269302 - (view) |
Author: Jim Fulton (j1m) * |
Date: 2016-06-26 17:46 |
FWIW, using uvloop avoids the hang.
|
msg269308 - (view) |
Author: Марк Коренберг (socketpair) * |
Date: 2016-06-26 20:09 |
Please reduce program, and make sure it still hangs.
|
msg269313 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-06-26 21:37 |
Yeah, I'd like to see a more minimal repro to understand what's going in.
|
msg269486 - (view) |
Author: Jim Fulton (j1m) * |
Date: 2016-06-29 13:30 |
This is already pretty minimal. There are no external dependencies.
|
msg269490 - (view) |
Author: Марк Коренберг (socketpair) * |
Date: 2016-06-29 13:36 |
Plese reduce even more. I mean remove debugging, specifi commands, and all extra code, that is not related to original problem.
|
msg269517 - (view) |
Author: Марк Коренберг (socketpair) * |
Date: 2016-06-29 15:17 |
Also I recommend you to use asyncio streams, instead of reinventing wheels.
So, reading your command will look like:
data = await stream.read_exactly(4)
(len,) = unpack(">I", data)
command = await stream.read_exactly(len)
|
msg269518 - (view) |
Author: Jim Fulton (j1m) * |
Date: 2016-06-29 15:20 |
OK, I *was* able to simplify it a fair bit. I'm uploading a new version. I left prints in because I think you'd find them helpful, but I'll upload another version without prints.
|
msg269520 - (view) |
Author: Марк Коренберг (socketpair) * |
Date: 2016-06-29 15:21 |
One more thing. Why you set socket.SO_LINGER ? and why lingering timeout is 0 seconds ?
Removing that eliminate problem completely.
|
msg269523 - (view) |
Author: Jim Fulton (j1m) * |
Date: 2016-06-29 15:25 |
Here's a version sans prints
|
msg269526 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-06-29 15:47 |
I can't personally run that code and get the results you are getting; could you please walk us through what happens (as far as you can tell)? Reading the code I find myself quite confused about which parts of the code might be active or not. E.g. is self.messages used? Does its actual contents matter? Where does it end up?
|
msg269527 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-06-29 16:10 |
Jim, I think you wanted to post this link in this issue: https://bugs.launchpad.net/zodb/+bug/135108/comments/9 instead of in #27392.
I can reproduce this on my mac, but so far I've no idea what's going on.
|
msg269528 - (view) |
Author: Jim Fulton (j1m) * |
Date: 2016-06-29 16:21 |
Guido, are you saying that the script runs without hanging for you?
(If you get a boatload of tracebacks, that's due to another asyncio bug in error handling.)
Are you running the version with prints?
This is an adaptation of the echo server and client from the docs.
The server runs in a thread. It just echos it's input.
The client just waits for a message from the server, and then send messages (one in attached echo2.py) and waits for replies. When I run this on Mac and ubuntu 14.04, the server never sees the messages sent by the client.
I'm uploading a newer version that simplifies the messages data structure and adds some prints to, I think, make the sequence easier to see.
Fixing the bug that causes all the tracebacks to be printed would also make this easier to interpret.
Commenting out the code that makes and closes the socket connections with SO_LINGER and running echo2.py should also make it easy to see the trivial expected client/server interaction.
I don't think the details of the interaction between the server and the client are very important, other than the fact that the client gets the first message from the server and the server doesn't get the subsequent message from the client.
|
msg269530 - (view) |
Author: Jim Fulton (j1m) * |
Date: 2016-06-29 16:23 |
Yuri, right you are. Thanks.
Марк, see https://bugs.launchpad.net/zodb/+bug/135108/comments/9
|
msg269532 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-06-29 17:04 |
Running out of time to debug this today. I think this is a bug in CPython, in either socket or select module. When I inject some debug code in selectors.py and replace KQueue with select(), I can see that the server thread's selector stops working at some point due to a EBADF error.
I think something similar is happening with the KQueue selector -- at some point it just stops to return events correctly.
Again, I might be wrong about this all, but this is what I think after 2.5 hours of debugging.
|
msg269533 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-06-29 17:16 |
No, I just don't have a computer right now, only a phone.
--Guido (mobile)
|
msg269534 - (view) |
Author: Jim Fulton (j1m) * |
Date: 2016-06-29 17:25 |
WRT CPython/sockets this problem doesn't happen if I use asyncore to accept connections and hand them off to create_connection. :)
It also doesn't occur with uvloop, which I assume still uses sockets.
Also, FWIW, the relevant ZEO test passes if I use SSL, which is how I'm working around this now for the tests.
|
msg269535 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-06-29 18:09 |
> It also doesn't occur with uvloop, which I assume still uses sockets.
No, uvloop doesn't use python sockets or select for IO at all. All IO is done in libuv.
> WRT CPython/sockets this problem doesn't happen if I use asyncore to accept connections and hand them off to create_connection. :)
Interesting.
|
msg276642 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-09-15 23:46 |
It looks like this was fixed by #27759!. Jim, could you please verify?
|
msg276711 - (view) |
Author: Jim Fulton (j1m) * |
Date: 2016-09-16 13:13 |
Cool, I will verify soon.
|
msg276802 - (view) |
Author: Jim Fulton (j1m) * |
Date: 2016-09-17 16:33 |
Yes, that change addresses this issue. Thanks!
Will this be backported?
|
msg277450 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2016-09-26 18:57 |
Yuri, are you going to backport the fix to 3.4?
|
msg277458 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-09-26 20:15 |
Isn't 3.4 in security fixes only mode?
|
msg277460 - (view) |
Author: Christian Heimes (christian.heimes) * |
Date: 2016-09-26 20:19 |
Jim ask for a backport. In case the problem is not a security issue that needs to be backported, feel free to close the ticket.
|
msg277463 - (view) |
Author: Jim Fulton (j1m) * |
Date: 2016-09-26 20:22 |
This is arguably a security issue because it's a DoS vector.
I don't feel strongly about it though.
|
msg277465 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-09-26 20:26 |
> Jim ask for a backport.
Sorry Jim, was replying from my email client, didn't see all messages.
> This is arguably a security issue because it's a DoS vector.
Yeah, I can see why. I can commit this to 3.4 in a week. Christian, feel free to commit this if you want this issue to be closed earlier.
|
msg278200 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-10-06 18:07 |
Alright, I've backported the fix to 3.4. Closing this.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:33 | admin | set | github: 71573 |
2016-10-06 18:07:07 | yselivanov | set | status: open -> closed
messages:
+ msg278200 stage: resolved |
2016-09-26 20:26:54 | yselivanov | set | messages:
+ msg277465 |
2016-09-26 20:22:25 | j1m | set | messages:
+ msg277463 |
2016-09-26 20:19:07 | christian.heimes | set | messages:
+ msg277460 |
2016-09-26 20:15:58 | yselivanov | set | status: pending -> open
messages:
+ msg277458 |
2016-09-26 18:57:00 | christian.heimes | set | status: open -> pending
nosy:
+ christian.heimes versions:
- Python 3.5 messages:
+ msg277450
assignee: yselivanov |
2016-09-17 16:33:19 | j1m | set | resolution: fixed messages:
+ msg276802 |
2016-09-16 13:13:05 | j1m | set | messages:
+ msg276711 |
2016-09-15 23:46:41 | yselivanov | set | messages:
+ msg276642 |
2016-07-06 15:29:31 | j1m | set | type: behavior -> security |
2016-06-29 18:09:07 | yselivanov | set | messages:
+ msg269535 |
2016-06-29 17:25:47 | j1m | set | messages:
+ msg269534 |
2016-06-29 17:16:31 | gvanrossum | set | messages:
+ msg269533 |
2016-06-29 17:04:47 | yselivanov | set | messages:
+ msg269532 |
2016-06-29 16:23:52 | j1m | set | messages:
+ msg269530 |
2016-06-29 16:21:29 | j1m | set | files:
+ echo2.py
messages:
+ msg269528 |
2016-06-29 16:10:03 | yselivanov | set | messages:
+ msg269527 |
2016-06-29 15:47:29 | gvanrossum | set | messages:
+ msg269526 |
2016-06-29 15:25:29 | j1m | set | files:
+ echo-no-print.py
messages:
+ msg269523 |
2016-06-29 15:21:09 | socketpair | set | messages:
+ msg269520 |
2016-06-29 15:20:23 | j1m | set | files:
- echo.py |
2016-06-29 15:20:08 | j1m | set | files:
+ echo.py
messages:
+ msg269518 |
2016-06-29 15:17:40 | socketpair | set | messages:
+ msg269517 |
2016-06-29 13:36:38 | socketpair | set | messages:
+ msg269490 |
2016-06-29 13:30:35 | j1m | set | messages:
+ msg269486 |
2016-06-26 21:37:56 | gvanrossum | set | messages:
+ msg269313 |
2016-06-26 20:09:35 | socketpair | set | nosy:
+ socketpair messages:
+ msg269308
|
2016-06-26 17:46:10 | j1m | set | messages:
+ msg269302 |
2016-06-25 20:56:34 | j1m | create | |