Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket.socket(fileno=fd) does not work as documented #72321

Closed
tiran opened this issue Sep 13, 2016 · 25 comments
Closed

socket.socket(fileno=fd) does not work as documented #72321

tiran opened this issue Sep 13, 2016 · 25 comments
Assignees
Labels
3.7 (EOL) end of life extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@tiran
Copy link
Member

tiran commented Sep 13, 2016

BPO 28134
Nosy @pitrou, @vstinner, @tiran, @bitdancer, @njsmith, @vadmium, @1st1, @YoSTEALTH
PRs
  • bpo-28134: Auto-detect socket values from file descriptor #1349
  • bpo-28134: Ignore proto in unknown socket test #5435
  • Files
  • sock_initobj_types.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/tiran'
    closed_at = <Date 2021-10-21.12:05:20.757>
    created_at = <Date 2016-09-13.19:13:31.625>
    labels = ['extension-modules', 'type-bug', '3.7']
    title = 'socket.socket(fileno=fd) does not work as documented'
    updated_at = <Date 2021-10-21.12:05:20.756>
    user = 'https://github.com/tiran'

    bugs.python.org fields:

    activity = <Date 2021-10-21.12:05:20.756>
    actor = 'christian.heimes'
    assignee = 'christian.heimes'
    closed = True
    closed_date = <Date 2021-10-21.12:05:20.757>
    closer = 'christian.heimes'
    components = ['Extension Modules']
    creation = <Date 2016-09-13.19:13:31.625>
    creator = 'christian.heimes'
    dependencies = []
    files = ['44687']
    hgrepos = []
    issue_num = 28134
    keywords = ['patch']
    message_count = 25.0
    messages = ['276327', '276328', '276330', '276363', '276395', '276660', '276684', '276866', '276879', '285972', '308433', '309207', '309225', '311175', '311176', '311178', '311181', '311252', '312681', '312684', '312690', '312708', '313153', '332704', '404590']
    nosy_count = 8.0
    nosy_names = ['pitrou', 'vstinner', 'christian.heimes', 'r.david.murray', 'njs', 'martin.panter', 'yselivanov', 'YoSTEALTH']
    pr_nums = ['1349', '5435']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue28134'
    versions = ['Python 3.5', 'Python 3.6', 'Python 3.7']

    @tiran
    Copy link
    Member Author

    tiran commented Sep 13, 2016

    Documentation of socket.socket(fileno) https://docs.python.org/3/library/socket.html#socket.socket says:

    If fileno is specified, the other arguments are ignored, causing the socket with the specified file descriptor to return.

    The feature does not work. fileno does not infer the values for family, type and proto from the fd. Instead if uses the other arguments. I don't see how this feature should have ever worked on POSIX. There are no calls to getsockopt() with SO_DOMAIN, SO_TYPE and SO_PROTOCOL.

    $ ./python 
    Python 3.7.0a0 (default:6bcedf96d25f, Sep 13 2016, 20:48:50) 
    [GCC 6.1.1 20160621 (Red Hat 6.1.1-3)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import socket
    >>> uds = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    >>> uds
    <socket.socket fd=3, family=AddressFamily.AF_UNIX, type=SocketKind.SOCK_STREAM, proto=0>
    >>> s = socket.socket(fileno=uds.fileno())
    >>> s
    <socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0>
    >>> s.family == uds.family
    False
    >>> s2 = socket.socket(type=socket.SOCK_DGRAM, fileno=uds.fileno())
    >>> s2
    <socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0>

    @tiran tiran added 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Sep 13, 2016
    @tiran
    Copy link
    Member Author

    tiran commented Sep 13, 2016

    I expected that socket.socket(fileno) would fill in all socket options like my own implementation https://github.com/tiran/socketfromfd/blob/master/socketfromfd.py

    @bitdancer
    Copy link
    Member

    See also bpo-27377?

    @vadmium
    Copy link
    Member

    vadmium commented Sep 14, 2016

    The documentation says that the family, type and proto attributes correspond to the constructor arguments. Although it is unfortunate and quirky, I think your behaviour does match the documentation.

    Do the mismatched settings cause any serious problems with socket methods, or just affect the Python-level attributes and repr()?

    Even without using fileno=..., you could argue that the proto attribute is not ideal:
    >>> s = socket()
    >>> s.proto
    0
    >>> s.getsockopt(SOL_SOCKET, SO_PROTOCOL)
    6
    >>> IPPROTO_TCP
    6

    Perhaps the way forward is to deprecate fileno=... in favour of Neil’s fromfd2() function. That would avoid any confusion about conflicting socket() constructor arguments and defaults. I.e. what does socket(AF_INET, SOCK_STREAM, fileno=unix_datagram_fd) mean, and is it equivalent to socket(filno=unix_datagram_fd)?

    @tiran
    Copy link
    Member Author

    tiran commented Sep 14, 2016

    Martin, the documentation says "If fileno is specified, the other arguments are ignored, causing the socket with the specified file descriptor to return." It's a direct quote from Python 3's socket library documentation. For a non-native speaker like me, this sentence implies that socket.socket(fileno) not only ignores the arguments (which it does not) but that the constructor uses the fileno to set family, type and proto.

    The socket module uses self->sock_family in several places to calculate the addr len or when it handles arguments and address information. Just look at this simple example:

    >>> import socket
    >>> uds = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    >>> s = socket.socket(fileno=uds.fileno())
    >>> s.bind('/tmp/sock')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: getsockaddrarg: AF_INET address must be tuple, not str
    >>> uds.bind('/tmp/sock')

    @vadmium
    Copy link
    Member

    vadmium commented Sep 16, 2016

    I agree the doc is far from perfect. The bit I was going off is just above <https://docs.python.org/3.5/library/socket.html#socket.socket.family\>, saying “these (read-only) attributes that correspond to the values given to the socket constructor”.

    My instinct would be to clarify that for existing versions 2.7, 3.5, etc, that the constructor arguments are _not_ ignored and should correspond to the file descriptor. Then in the next Python version we can make it more automatic using the getsockopt() techniques.

    @tiran
    Copy link
    Member Author

    tiran commented Sep 16, 2016

    How about we fix the code and only document the limitations instead? :) After all it works fine on Windows and it is documented to work on all operating systems. Since it's a bug we can fix it in 3.5, too.

    My patch implements a best-effort to get type, family and proto from the socket. It ignores any errors.

    @vadmium
    Copy link
    Member

    vadmium commented Sep 18, 2016

    Personally, I’m not too enthusiastic, because it is rather magical, and does not work in all cases. It seems more like a feature than a bug fix. But I have rarely used the fileno=... parameter, and it shouldn’t have much negative impact, so I’m not too fussed.

    According to bpo-27377, these are some cases where parts won’t work:

    • Windows and OS X (and older versions of Linux and BSD) don’t have SO_PROTOCOL
    • getsockname() not guaranteed to work on unbound sockets, especially on Windows, and Free BSD with SCTP sockets

    Also, if we are going to read SO_PROTOCOL when fileno=... is given, why not also read it in the normal case when proto=0 (unspecified) is given?

    @tiran
    Copy link
    Member Author

    tiran commented Sep 18, 2016

    I'm well aware that it does not work in all cases. However it works good enough in most cases. Right now the fileno argument must be considered broken because it leads to wrong results. It is a problem and possible security issue for a couple of use cases, e.g. passing of sockets through AF_UNIX AUX data or systemd socket activation.

    On Windows it is less problematic because socket(filno) works correctly with WSAPROTOCOL_INFO. It's only broken for integer fd.

    I have considered to set type, family and proto to 0 (unspec) when the getsockopt and getsockname fail.

    I have a differnt ticket for the protocol issue, bpo-27816.

    @tiran tiran added the extension-modules C modules in the Modules dir label Sep 21, 2016
    @tiran tiran self-assigned this Sep 21, 2016
    @tiran
    Copy link
    Member Author

    tiran commented Jan 21, 2017

    Here is another example of how broken and dangerous fileno argument is. getpeername() is neither a valid IPv4 tuple nor a valid IPv6 tuple. It's all messed up:

    >>> import socket
    >>> s = socket.create_connection(('www.python.org', 443))
    >>> s
    <socket.socket fd=3, family=AddressFamily.AF_INET6, type=SocketKind.SOCK_STREAM, proto=6, laddr=('2003:58:bc4a:3b00:56ee:75ff:fe47:ca7b', 59730, 0, 0), raddr=('2a04:4e42:1b::223', 443, 0, 0)>
    >>> socket.socket(fileno=s.fileno())
    <socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('2003:58:bc4a:3b00::%2550471192', 59730, 0, 2550471192), raddr=('2a04:4e42:1b:0:700c:e70b:ff7f:0%2550471192', 443, 0, 2550471192)>

    @1st1
    Copy link
    Member

    1st1 commented Dec 16, 2017

    Let's quickly iterate over what's possible first:

    • It's possible to check the type of the FD using "getsockopt(SOL_SOCKET, SO_TYPE)" on all platforms.

    • It's possible to check family/proto of the FD using "getsockopt(SOL_SOCKET, SO_DOMAIN/SO_PROTOCOL)" on Linux.

    Given the above I propose the following:

    1. Passing a wrong type can be considered as a serious error. SOCK_STREAM is fundamentally different from SOCK_DGRAM. Because we have a way to validate socket type on all platforms, I propose to do this validation when we create a socket from an FD, and raise an error if the passed socket type isn't correct.

    2. When Python is running with '-X dev' (new dev mode added by Victor), I propose to also validate socket.family and socket.proto on Linux, and raise a RuntimeWarning when they don't match.

    @tiran
    Copy link
    Member Author

    tiran commented Dec 29, 2017

    Yuri,

    • The patch is purely about Python's view of the socket. The actual behavior of the OS socket fd is not influenced by socket.socket()'s family, type and protocol. However the especially the family is critical for Python because a lot of socket code uses the family to decide how to format its address or the peer address.

    • On some platforms and/or for some socket types, it is not possible to get the address family or type unless connect() or bind() have been called. On Linux it seems to work for both new sockets and bound/connected sockets. On Windows it never works for a fresh socket.

    • It looks like Windows doesn't have protocol dedicated and always uses 0 as protocols.

    • Using the wrong type is less of an issue. Using the wrong family is really really bad, see https://bugs.python.org/issue28134#msg285972.

    • I don't care much about validating the values. I'm concerned to have correct values by default. Validation can be implemented in a separate PR. In that case we want to add socket.closefd(fd: int). On Windows os.close() can't be used for socket fds.

    @tiran
    Copy link
    Member Author

    tiran commented Dec 30, 2017

    Issue bpo-32454 adds socket.close(fd) function.

    @tiran
    Copy link
    Member Author

    tiran commented Jan 29, 2018

    New changeset b6e43af by Christian Heimes in branch 'master':
    bpo-28134: Auto-detect socket values from file descriptor (bpo-1349)
    b6e43af

    @tiran
    Copy link
    Member Author

    tiran commented Jan 29, 2018

    I'm leaving the ticket open to remind me that I have to add a whatsnew entry and maybe consider a backport.

    @vstinner
    Copy link
    Member

    x86-64 Sierra 3.x is grumpy:

    http://buildbot.python.org/all/#/builders/14/builds/659

    ======================================================================
    FAIL: test_uknown_socket_family_repr (test.test_socket.GeneralModuleTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/Users/buildbot/buildarea/3.x.billenstein-sierra/build/Lib/test/test_socket.py", line 1645, in test_uknown_socket_family_repr
        self.assertEqual(s.proto, 23)
    AssertionError: 0 != 23

    @tiran
    Copy link
    Member Author

    tiran commented Jan 29, 2018

    The test for proto isn't super critical. It's mostly ignored any way. I'll submit a band-aid.

    @tiran
    Copy link
    Member Author

    tiran commented Jan 30, 2018

    New changeset 2e0ecde by Christian Heimes in branch 'master':
    bpo-28134: Ignore proto in unknown socket test (GH-5435)
    2e0ecde

    @YoSTEALTH
    Copy link
    Mannequin

    YoSTEALTH mannequin commented Feb 24, 2018

    I am using 3.7.0b1 i don't think this issue is fixed!

    # simple mockup:
    # --------------

    def accept(sock):
        client, addr = sock.accept()
        inside = socket(fileno=client.fileno())
        print(inside)
        # <__main__.Socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 8000), raddr=('127.0.0.1', 42532)>
        return inside
    
    outside = accept(sock)
    print(outside)
    # <__main__.Socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6>

    So the "laddr" and "raddr" goes missing the second its out of the function???

    This has wasted days of my time, to even get to this point of figuring out whats going on wasn't easy! extremely frustrating.

    @tiran
    Copy link
    Member Author

    tiran commented Feb 24, 2018

    The problem is fixed and your example behaves as expected.

    The laddr string is the result of inside.getsockname() and raddr string is the result of inside.getpeername(). In your example, inside and client share the same file descriptor. When the function exits client goes out of scope, the reference count of client drops to 0, the object gets deallocated and Python closes the shared file descriptor. In outside, the shared fd is gone and outside.getsockname() and outside.getpeername() fail. Basically the outside socket is dead because its fd has been closed.

    You have to duplicate the fd or detach the socket to keep the fd open.

    @YoSTEALTH
    Copy link
    Mannequin

    YoSTEALTH mannequin commented Feb 24, 2018

    Christian thank you for your reply, i really appreciate it.

    Lets analyze this a bit:

    • Technically speaking i can "return client" directly and it would NOT close the socket.
    • Shouldn't "inside" having reference to same fd mean +1 to reference count. Considering its a new object?
      id(client): 140340037397192
      id(inside): 140340010863560

    I could understand making a duplicate of the fd if it were being passed across thread/process but in its original thread making a duplicate! This behavior is extremely odd.

    I fell like everyone that uses socket(fileno) will run into days wasted and frustration just to figure out these solutions.

    @tiran
    Copy link
    Member Author

    tiran commented Feb 24, 2018

    File descriptors are a advanced features and expose low level operating system resources. You really have to understand how the OS works. They cannot be reference counted. In fact they *are* the reference to entries in the Kernel's global open file table. I gave a talked about FDs at PyCon US two years ago, maybe https://speakerdeck.com/tiran/pycon-2016-file-descriptors-unix-sockets-and-other-posix-magic will help you understand fds better.

    You can make your example work with https://docs.python.org/3/library/socket.html#socket.socket.detach

    def accept(sock):
        client, addr = sock.accept()
        inside = socket(fileno=client.fileno())
        client.detach()
        print(inside)
        return inside
        # after return, the client socket is closed by due to detach the fd isn't no longer close

    The feature works as intended. It's designed to turn an inherited file descriptor (e.g. systemd socket activation) or transfered fds (e.g. through AF_UNIX SCM_RIGHTS). In both cases the fd is already a duplicated fd.

    @YoSTEALTH
    Copy link
    Mannequin

    YoSTEALTH mannequin commented Mar 2, 2018

    It would be nice if "python" accounted for such low level os things. None the less client.detach() method works fine.

    I really did enjoy your talk, kinda bummed it was short and didn't get into more details.

    Thanks for your help and patience Christian :)

    @njsmith
    Copy link
    Contributor

    njsmith commented Dec 29, 2018

    Am I right that this is considered fixed (or at least as fixed as it can be), and the issue should be closed?

    Also, small note in case anyone stumbles across this in the future and is confused: Python does *not* handle this correctly on Windows. I suspect Christian was confused because there's an undocumented features on Windows where if you pass fileno=<opaque string returned by socket.share()>, then that correctly reinstantiates the socket object. But fileno=<raw socket handle> doesn't seem to do any special autodetection of type/family/proto.

    @tiran
    Copy link
    Member Author

    tiran commented Oct 21, 2021

    Yes, the fix works well enough.

    @tiran tiran closed this as completed Oct 21, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants