Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asyncio server hang when clients connect and immediately disconnect #71573

Closed
jimfulton mannequin opened this issue Jun 25, 2016 · 27 comments
Closed

Asyncio server hang when clients connect and immediately disconnect #71573

jimfulton mannequin opened this issue Jun 25, 2016 · 27 comments
Assignees
Labels

Comments

@jimfulton
Copy link
Mannequin

jimfulton mannequin commented Jun 25, 2016

BPO 27386
Nosy @gvanrossum, @vstinner, @tiran, @jimfulton, @socketpair, @1st1
Files
  • echo.py
  • echo-no-print.py
  • echo2.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/1st1'
    closed_at = <Date 2016-10-06.18:07:07.289>
    created_at = <Date 2016-06-25.20:56:34.480>
    labels = ['type-security', 'expert-asyncio']
    title = 'Asyncio server hang when clients connect and immediately disconnect'
    updated_at = <Date 2016-10-06.18:07:07.287>
    user = 'https://github.com/jimfulton'

    bugs.python.org fields:

    activity = <Date 2016-10-06.18:07:07.287>
    actor = 'yselivanov'
    assignee = 'yselivanov'
    closed = True
    closed_date = <Date 2016-10-06.18:07:07.289>
    closer = 'yselivanov'
    components = ['asyncio']
    creation = <Date 2016-06-25.20:56:34.480>
    creator = 'j1m'
    dependencies = []
    files = ['43580', '43581', '43582']
    hgrepos = []
    issue_num = 27386
    keywords = []
    message_count = 27.0
    messages = ['269256', '269302', '269308', '269313', '269486', '269490', '269517', '269518', '269520', '269523', '269526', '269527', '269528', '269530', '269532', '269533', '269534', '269535', '276642', '276711', '276802', '277450', '277458', '277460', '277463', '277465', '278200']
    nosy_count = 6.0
    nosy_names = ['gvanrossum', 'vstinner', 'christian.heimes', 'j1m', 'socketpair', 'yselivanov']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue27386'
    versions = ['Python 3.4']

    @jimfulton
    Copy link
    Mannequin Author

    jimfulton mannequin commented Jun 25, 2016

    I recently ported ZEO to asyncio.

    We'd had a bug in our old asyncore-based server where the server would hang if several connections were made and then immediately disconnected on Mac OS X. This was due to an error-handling bug in our code that we fixed. We have a regression test for this case.

    The regression test for this case fails using asyncio.Server.

    I've attached a (ZEO-independent) script that demonstrates the problem.

    If you run the script with Python 3.4 or 3.5, I expect the script will hang. It does for me on Mac OS X 10.10.5 and Ubuntu 14.04.

    @jimfulton jimfulton mannequin added type-bug An unexpected behavior, bug, or error topic-asyncio labels Jun 25, 2016
    @jimfulton
    Copy link
    Mannequin Author

    jimfulton mannequin commented Jun 26, 2016

    FWIW, using uvloop avoids the hang.

    @socketpair
    Copy link
    Mannequin

    socketpair mannequin commented Jun 26, 2016

    Please reduce program, and make sure it still hangs.

    @gvanrossum
    Copy link
    Member

    Yeah, I'd like to see a more minimal repro to understand what's going in.

    @jimfulton
    Copy link
    Mannequin Author

    jimfulton mannequin commented Jun 29, 2016

    This is already pretty minimal. There are no external dependencies.

    @socketpair
    Copy link
    Mannequin

    socketpair mannequin commented Jun 29, 2016

    Plese reduce even more. I mean remove debugging, specifi commands, and all extra code, that is not related to original problem.

    @socketpair
    Copy link
    Mannequin

    socketpair mannequin commented Jun 29, 2016

    Also I recommend you to use asyncio streams, instead of reinventing wheels.

    So, reading your command will look like:

    data = await stream.read_exactly(4)
    (len,) = unpack(">I", data)
    command = await stream.read_exactly(len)

    @jimfulton
    Copy link
    Mannequin Author

    jimfulton mannequin commented Jun 29, 2016

    OK, I *was* able to simplify it a fair bit. I'm uploading a new version. I left prints in because I think you'd find them helpful, but I'll upload another version without prints.

    @socketpair
    Copy link
    Mannequin

    socketpair mannequin commented Jun 29, 2016

    One more thing. Why you set socket.SO_LINGER ? and why lingering timeout is 0 seconds ?

    Removing that eliminate problem completely.

    @jimfulton
    Copy link
    Mannequin Author

    jimfulton mannequin commented Jun 29, 2016

    Here's a version sans prints

    @gvanrossum
    Copy link
    Member

    I can't personally run that code and get the results you are getting; could you please walk us through what happens (as far as you can tell)? Reading the code I find myself quite confused about which parts of the code might be active or not. E.g. is self.messages used? Does its actual contents matter? Where does it end up?

    @1st1
    Copy link
    Member

    1st1 commented Jun 29, 2016

    Jim, I think you wanted to post this link in this issue: https://bugs.launchpad.net/zodb/+bug/135108/comments/9 instead of in bpo-27392.

    I can reproduce this on my mac, but so far I've no idea what's going on.

    @jimfulton
    Copy link
    Mannequin Author

    jimfulton mannequin commented Jun 29, 2016

    Guido, are you saying that the script runs without hanging for you?
    (If you get a boatload of tracebacks, that's due to another asyncio bug in error handling.)

    Are you running the version with prints?

    This is an adaptation of the echo server and client from the docs.

    The server runs in a thread. It just echos it's input.

    The client just waits for a message from the server, and then send messages (one in attached echo2.py) and waits for replies. When I run this on Mac and ubuntu 14.04, the server never sees the messages sent by the client.

    I'm uploading a newer version that simplifies the messages data structure and adds some prints to, I think, make the sequence easier to see.

    Fixing the bug that causes all the tracebacks to be printed would also make this easier to interpret.

    Commenting out the code that makes and closes the socket connections with SO_LINGER and running echo2.py should also make it easy to see the trivial expected client/server interaction.

    I don't think the details of the interaction between the server and the client are very important, other than the fact that the client gets the first message from the server and the server doesn't get the subsequent message from the client.

    @jimfulton
    Copy link
    Mannequin Author

    jimfulton mannequin commented Jun 29, 2016

    Yuri, right you are. Thanks.

    Марк, see https://bugs.launchpad.net/zodb/+bug/135108/comments/9

    @1st1
    Copy link
    Member

    1st1 commented Jun 29, 2016

    Running out of time to debug this today. I think this is a bug in CPython, in either socket or select module. When I inject some debug code in selectors.py and replace KQueue with select(), I can see that the server thread's selector stops working at some point due to a EBADF error.

    I think something similar is happening with the KQueue selector -- at some point it just stops to return events correctly.

    Again, I might be wrong about this all, but this is what I think after 2.5 hours of debugging.

    @gvanrossum
    Copy link
    Member

    No, I just don't have a computer right now, only a phone.

    --Guido (mobile)

    @jimfulton
    Copy link
    Mannequin Author

    jimfulton mannequin commented Jun 29, 2016

    WRT CPython/sockets this problem doesn't happen if I use asyncore to accept connections and hand them off to create_connection. :)

    It also doesn't occur with uvloop, which I assume still uses sockets.

    Also, FWIW, the relevant ZEO test passes if I use SSL, which is how I'm working around this now for the tests.

    @1st1
    Copy link
    Member

    1st1 commented Jun 29, 2016

    It also doesn't occur with uvloop, which I assume still uses sockets.

    No, uvloop doesn't use python sockets or select for IO at all. All IO is done in libuv.

    WRT CPython/sockets this problem doesn't happen if I use asyncore to accept connections and hand them off to create_connection. :)

    Interesting.

    @jimfulton jimfulton mannequin added type-security A security issue and removed type-bug An unexpected behavior, bug, or error labels Jul 6, 2016
    @1st1
    Copy link
    Member

    1st1 commented Sep 15, 2016

    It looks like this was fixed by bpo-27759!. Jim, could you please verify?

    @jimfulton
    Copy link
    Mannequin Author

    jimfulton mannequin commented Sep 16, 2016

    Cool, I will verify soon.

    @jimfulton
    Copy link
    Mannequin Author

    jimfulton mannequin commented Sep 17, 2016

    Yes, that change addresses this issue. Thanks!

    Will this be backported?

    @tiran
    Copy link
    Member

    tiran commented Sep 26, 2016

    Yuri, are you going to backport the fix to 3.4?

    @1st1
    Copy link
    Member

    1st1 commented Sep 26, 2016

    Isn't 3.4 in security fixes only mode?

    @tiran
    Copy link
    Member

    tiran commented Sep 26, 2016

    Jim ask for a backport. In case the problem is not a security issue that needs to be backported, feel free to close the ticket.

    @jimfulton
    Copy link
    Mannequin Author

    jimfulton mannequin commented Sep 26, 2016

    This is arguably a security issue because it's a DoS vector.

    I don't feel strongly about it though.

    @1st1
    Copy link
    Member

    1st1 commented Sep 26, 2016

    Jim ask for a backport.

    Sorry Jim, was replying from my email client, didn't see all messages.

    This is arguably a security issue because it's a DoS vector.

    Yeah, I can see why. I can commit this to 3.4 in a week. Christian, feel free to commit this if you want this issue to be closed earlier.

    @1st1
    Copy link
    Member

    1st1 commented Oct 6, 2016

    Alright, I've backported the fix to 3.4. Closing this.

    @1st1 1st1 closed this as completed Oct 6, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants