This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: asyncio.StreamReader.read hangs for reused socket file descriptors when asyncio.StreamWriter.close() is not called
Type: behavior Stage:
Components: asyncio Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, ntc2, yselivanov
Priority: normal Keywords:

Created on 2021-08-02 07:45 by ntc2, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
server.py ntc2, 2021-08-02 07:45 Example server code
client.py ntc2, 2021-08-02 07:45 Example client code
Messages (3)
msg398731 - (view) Author: Nathan Collins (ntc2) * Date: 2021-08-02 07:45
Problem
=======

When using asyncio streams via

    (r,w) = asyncio.open_connection(sock=socket)

with a already connected socket `socket`, if you call `socket.close()`
but not `w.close()` when you're done, then when the OS later reuses
the file descriptor of `socket` for a new socket, and that new socket
is used with

    (r,w) = asyncio.open_connection(sock=socket)

again, the `r.read(...)` for the new `r` can hang indefinitely, even
when data is available on the underlying socket. When the hang
happens, closing the socket on the writer side doesn't help, and the
socket gets stuck forever in the CLOSE_WAIT state on the reader
side. Using `strace` shows that the reader side is stuck in
`epoll_wait(...)`.

Client and server programs that reproduce the bug
=================================================

Run the server in one shell and then run the client in the other
shell. They each take one argument, that controls how they close their
sockets/streams when they're done.

Usage: python3 client.py CLOSE_MODE
Usage: python3 server.py CLOSE_MODE

Where CLOSE_MODE can be
* "": don't close the socket in any way
* "S": close `open_connection` socket using `socket.socket.close()`
* "A": close `open_connection` socket using `asyncio.StreamWriter.close()`
* "SA": close `open_connection` socket both ways

These are also attached, but here's the source.

The `client.py`:

``` python
import asyncio, socket, sys

async def client(src_ip, close_mode):
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    src_ip_port = (src_ip, 0)
    s.bind(src_ip_port)
    dst_ip_port = ('127.0.0.2', 12345)
    s.connect(dst_ip_port)
    print(f'Connected from {src_ip}')
    print(s)
    try:
        (r,w) = await asyncio.open_connection(sock=s)
        print('<- ', end='', flush=True)
        in_line = await r.read(100)
        print(in_line)
        out_line = b'client'
        print('-> ', end='', flush=True)
        w.write(out_line)
        await w.drain()
        print(out_line)
    finally:
        if 'S' in close_mode:
            s.close()
        if 'A' in close_mode:
            w.close()
            await w.wait_closed()
        print('Closed socket')
        print()

async def main(close_mode):
    await client('127.0.0.3', close_mode)
    await client('127.0.0.4', close_mode)
close_mode = sys.argv[1]
asyncio.run(main(close_mode))
```

The `server.py`:

``` python
import asyncio, socket, sys

async def server(close_mode):
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    ip = '127.0.0.2'
    port = 12345
    print(f'Listening on {ip}:{port}')
    print(s)
    try:
        s.bind((ip, port))
        s.listen()
        while True:
            (a, (a_ip, a_port)) = s.accept()
            print(f'Client connected from {a_ip}:{a_port}')
            print(a)
            try:
                (r,w) = await asyncio.open_connection(sock=a)
                print('-> ', end='', flush=True)
                out_line = b'server'
                w.write(out_line)
                await w.drain()
                print(out_line)
                print('<- ', end='', flush=True)
                in_line = await r.read(100)
                print(in_line)
            finally:
                if 'S' in close_mode:
                    a.close()
                if 'A' in close_mode:
                    w.close()
                    await w.wait_closed()
                print('Closed client socket')
                print()
    finally:
        s.close()
        print('Closed server socket')

close_mode = sys.argv[1]
asyncio.run(server(close_mode))
```

Example session: `server.py S` and `client.py A`
================================================

Note that file descriptor 7 is reused on the server side, before the
server hangs on `r.read`.

Run the server in one shell:

    $ python3 server.py S
    Listening on 127.0.0.2:12345
    <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 0)>
    Client connected from 127.0.0.3:34135
    <socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.2', 12345), raddr=('127.0.0.3', 34135)>
    -> b'server'
    <- b'client'
    Closed client socket
    
    Client connected from 127.0.0.4:46639
    <socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.2', 12345), raddr=('127.0.0.4', 46639)>
    -> b'server'
    <-

The server is hanging on `r.read` above.

Run the client in another shell, after starting the server:

    $ python3 client.py A
    Connected from 127.0.0.3
    <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.3', 34135), raddr=('127.0.0.2', 12345)>
    <- b'server'
    -> b'client'
    Closed socket

    Connected from 127.0.0.4
    <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.4', 46639), raddr=('127.0.0.2', 12345)>
    <- b'server'
    -> b'client'
    Closed socket

    $ lsof -ni @127.0.0.2
    COMMAND   PID     USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
    python3 26692 conathan    6u  IPv4 9992763      0t0  TCP 127.0.0.2:12345 (LISTEN)
    python3 26692 conathan    7u  IPv4 9991149      0t0  TCP 127.0.0.2:12345->127.0.0.4:46639 (CLOSE_WAIT)

Example session: `server.py ''` and `client.py A`
================================================

Note that file descriptors are not reused on the server side now, and
nothing hangs.

Server:

    $ python3 server.py ''
    Listening on 127.0.0.2:12345
    <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 0)>
    Client connected from 127.0.0.3:37833
    <socket.socket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.2', 12345), raddr=('127.0.0.3', 37833)>
    -> b'server'
    <- b'client'
    Closed client socket

    Client connected from 127.0.0.4:39463
    <socket.socket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.2', 12345), raddr=('127.0.0.4', 39463)>
    -> b'server'
    <- b'client'
    Closed client socket

Client:

    $ python3 client.py A
    Connected from 127.0.0.3
    <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.3', 37833), raddr=('127.0.0.2', 12345)>
    <- b'server'
    -> b'client'
    Closed socket

    Connected from 127.0.0.4
    <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.4', 39463), raddr=('127.0.0.2', 12345)>
    <- b'server'
    -> b'client'
    Closed socket

Behavior for different combinations of closure modes
====================================================

Perhaps this is overkill, but here's what happens for all 15 possible
combinations of how the client and server close their connections.

For example, "Client=S, Server=S" below means we run `$ python3
client.py S` and `$ python3 server.py S`. Sometimes multiple
combinations have the same behavior, so they're grouped together
below.

Client=S, Server='';
Client=S, Server=S;
Client=S, Server=A;
Client=S, Server=SA
-------------------

Client hangs on `r.read` on second connection, and killing the server
on the other end has no effect, with the socket stuck in CLOSE_WAIT on
the client side forever.

Client='', Server=S;
Client=A, Server=S
------------------

Server hangs on `r.read` on second connection, and client exits
normally, with the socket stuck in CLOSE_WAIT on the server side
forever.

Client=SA, Server=S
-------------------

Everything works the first time, but if you run the client in a loop,
e.g. with

    $ while true; do python3 client.py SA; done

then the server will eventually hang on `r.read` after ~3 client
sessions.

Client='', Server='';
Client='', Server=A;
Client=A, Server='';
Client=SA, Server=''
-------------------

Everything works! But here we see that the client and/or server
(whichever side is using '' for mode) is not reusing the socket file
descriptors right away (have to wait for GC). This is evidence that
the problem is due to stale state in asyncio tied to the reused file
descriptors.

Client=A, Server=A;
Client=A, Server=SA;
Client=SA, Server=A;
Client=SA, Server=SA
--------------------

Everything works, but this is not surprising because both sides closed
the `StreamWriter` with `w.close()`.

Possibly related bugs
=====================

https://bugs.python.org/issue43253: Windows only, calling
socket.socket.close() on a socket used with
asyncio.open_connection(sock=socket).

https://bugs.python.org/issue41317: reused file descriptors across
different connections.

https://bugs.python.org/issue34795,
https://bugs.python.org/issue30064: closing a socket used with
asyncio.

https://bugs.python.org/issue35065: reading from a closed stream can
hang. Could be related if the problem is due to aliased streams from
reused file descriptors.

https://bugs.python.org/issue43183: sockets used with asyncio getting
stuck in WAIT_CLOSED.

System info
===========

$ python3 --version
Python 3.9.6

$ lsb_release -a
LSB Version:    core-9.20170808ubuntu1-noarch:printing-9.20170808ubuntu1-noarch:security-9.20170808ubuntu1-noarch
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic
msg398734 - (view) Author: Nathan Collins (ntc2) * Date: 2021-08-02 08:50
Oh, and I can't count: there are 16 = 4x4 possible combinations of socket closure modes for the client and server. The one I missed was Client='', Server=SA, where everything works because the client doesn't reuse file descriptors and the server closes its sockets correctly.
msg404916 - (view) Author: Nathan Collins (ntc2) * Date: 2021-10-24 08:09
Just wanted to clarify: my previous "where everything works" comment is not saying this bug doesn't exist, I just mean I missed one case in my analysis of the bug. The bug is very much there, and easy to reproduce using the example programs I attached.

Bump!
History
Date User Action Args
2022-04-11 14:59:48adminsetgithub: 88968
2021-10-24 08:09:32ntc2setmessages: + msg404916
2021-08-02 08:50:52ntc2setmessages: + msg398734
2021-08-02 07:45:52ntc2setfiles: + client.py
2021-08-02 07:45:23ntc2create