classification
Title: Fix socket.type on OSes with SOCK_NONBLOCK
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, christian.heimes, inada.naoki, njs, pitrou, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2017-12-15 03:42 by yselivanov, last changed 2017-12-19 01:14 by yselivanov. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 4877 merged yselivanov, 2017-12-15 03:45
Messages (23)
msg308363 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-15 03:42
On Linux, socket type is both a socket type and a bit mask (of SOCK_CLOEXEC and SOCK_NONBLOCK).  Therefore, anyone who write code like 'if sock.type == SOCK_STREAM' writes non-portable code, that occasionally breaks on Linux.  

This caused some hard to spot bugs in asyncio already: https://bugs.python.org/issue27456  -- this one was discovered only 1 year after the actual change.

On Linux, in 'include/linux/net.h' there's a SOCK_TYPE_MASK macro defined to 0xF.  The idea is that on Linux you should mask socket type before comparing it to SOCK_STREAM or other socket types.

Using socket.SOCK_NONBLOCK on Python was always error-prone even if one targets only Linux.  For instance, socket.type won't be updated to include SOCK_NONBLOCK if the socket was updated via ioctl call directly.

Therefore, I propose to fix socket.type getter to always (on Linux) apply that mask in Python.
msg308370 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-12-15 09:47
Should we backport it to 3.6?
msg308371 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-12-15 09:56
Document of socket.type says:

  The socket type.

  https://docs.python.org/3/library/socket.html#socket.socket.type

OK, so, are SOCK_CLOEXEC and SOCK_NONBLOCK socket type?

  These two constants, if defined, can be combined with the socket types and allow you to set some flags atomically...

   https://docs.python.org/3/library/socket.html#socket.SOCK_CLOEXEC

So they are not socket type.  They are constants which can be used for
setting some flags.

Regarding to this document, I think this can be bugfix rather than
behavior change.
msg308376 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-12-15 10:20
Python's handling of socket has multiple issues, see #28134 for more fun. At the moment the values of type, family and protocol are unreliable at best and dangerously wrong at worst. The attributes are only correct if and only if the socket has not been created from a file descriptor and the socket hasn't been manipulated with ioctl or setsockopt().

Now to the issue at hand. I'm -1 on applying a mask to type. You can easily work around your problem by replacing "sock.type == SOCK_STREAM" with "sock.type & SOCK_STREAM == SOCK_STREAM".
msg308409 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-15 16:32
> You can easily work around your problem by replacing "sock.type == SOCK_STREAM" with "sock.type & SOCK_STREAM == SOCK_STREAM".

Heh :) No, that would be a buggy code.  Try this on your Linux box:

    (socket.SOCK_SEQPACKET & socket.SOCK_STREAM) == socket.SOCK_STREAM

The _only_ way of checking socket type reliably is to explicitly reset SOCK_NONBLOCK and SOCK_CLOEXEC or to apply 0xF mask on Linux, and don't do that on other platforms.

That's why I think it's important to fix this one way or another.
msg308410 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-15 16:38
On GH (https://github.com/python/cpython/pull/4877), Antoine wrote:

> I agree with Victor:

> it can't go into 3.6
> making the change in 3.7 is contentious
> Can there be an other way to solve the issue? Can we for example keep socket.type as it is and add a utility function or a property reporting the "masked" type?

I agree.   Here's what I propose:

1. We don't touch `socket.type`.  We don't know what kind of code exists out there and in what ways it will break.  The ship has sailed to fix it.

2. We add `socket.SOCK_TYPE_MASK` (on Linux).

3. We fix `socket.__repr__` to correctly render type.  Currently:

    >>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM | socket.SOCK_NONBLOCK)
    >>> s
    <socket.socket fd=3, family=AddressFamily.AF_INET, type=2049, proto=0, laddr=('0.0.0.0', 0)>

What I want to see:

    <socket.socket fd=3, family=AddressFamily.AF_INET, type=SOCK_STREAM | SOCK_NONBLOCK, proto=0, laddr=('0.0.0.0', 0)>

4. We add `socket.true_type` which will apply `SOCK_TYPE_MASK` and return a socket type that's safe to compare to `socket.SOCK_*` constants.

Thoughts?
msg308411 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2017-12-15 16:59
Should socket.true_type use getsockopt(fd, ...) to get the actual type?

While we are talking about socket types, please have a look at #28134 .
msg308432 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-16 00:11
Update:

I've rewritten the PR from scratch.

1. SOCK_TYPE_MASK isn't exported on Linux.  Therefore we will not export socket.SOCK_TYPE_MASK too.

2. I've added the new socket.truetype property.

3. When a socket is created without an FD, e.g. "socket.socket(family, type, proto)", we trust that the type of the socket is correct.  It might contain extra flags on Linux, so we filter them out.

4. When a socket is created from an FD, e.g. with "socket.fromfd" or with "socket.socket(family, type, proto, FD)" we don't trust the type.  So socket.truetype will call getsockopt(SOL_SOCKET, SO_TYPE) on the FD to ensure we have the correct socket type at hand.

5. Since Linux doesn't export SOCK_TYPE_MASK I decided to hardcode it in socketmodule.c.  My reasoning:

a/ It's highly unlikely that Linux will implement 5 more new socket types anytime soon, so 0xF should last for a *very* long time.

b/ It's more likely than "a/" that they add a new flag like SOCK_CLOEXEC, in which case the "type & ~(SOCK_NONBLOCK | SOCK_CLOEXEC)" approach will fail.

So given a/ and b/ I think it's safe to just use 0xF mask on Linux.
msg308437 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-16 00:48
Another name option: socket.realtype
msg308442 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2017-12-16 03:19
I think the earliest open bug about this is bpo-21327.
msg308444 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-16 03:41
I've been thinking a lot about this problem, and I'm really tempted to fix sock.type property:

1. The problem first appeared in Python 3.2.

2. Python 2.7 doesn't have this problem at all, and doesn't even export socket.SOCK_NONBLOCK.  If we fix this in 3.7 it *will* actually help some poor souls with porting their network applications.

3. People use Python when they want a high-level portable language.  This annoying Linux quirk makes it super hard to write correct socket code.

4. I can't actually come up with any decent Linux-only example of using this quirk of socket.type.  Why would one check if sock.type has SOCK_NONBLOCK?  And even if they check it, one call to sock.settimeout() will make sock.type information outdated and simply wrong.

Let's just fix sock.type?
msg308446 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-16 03:46
There's also a precedent of us breaking socket API.  In Python < 3.6:

  sock = socket.socket()
  fd = sock.fileno()
  os.close(fd)
  sock.close()

Everything works fine.  In 3.6+:

  sock.close()  # Will raise OSError

uvloop broke on this during the beta period and I fixed the issue quickly.  I don't remember seeing any complaints about this, so I suspect that this issue wasn't a big deal, people just fixed their code.
msg308450 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2017-12-16 04:17
I did a little digging. It's more complicated than just "on Linux, these show up in the socket type".

Background: SOCK_NONBLOCK and SOCK_CLOEXEC are Linux-isms. The standard way to control blocking-ness and cloexec-ness is via ioctl/fcntl, and Linux supports that too. But as an extension, it also has these flags that can be passed as part of the socket type, which is nice because it lets you set the flag atomically at the time the socket is created, which is very important for CLOEXEC, and ... kinda pointless for NONBLOCK, but whatever, Linux supports it.

Note that as far as Linux is concerned, this is just a sneaky way to smuggle an extra argument through the socket() call. The flags *do not become part of the socket type*. If you create a socket with type=SOCK_STREAM | SOCK_NONBLOCK, and then do getsockopt(SOL_SOCKET, SOCK_TYPE) on it, Linux reports that the socket's type is just SOCK_STREAM, *without* mentioning SOCK_NONBLOCK.

One more important piece of background: Python's socket.type attribute doesn't really have any connection to the OS-level socket type. (This is part of what bpo-28134 is about.) Python just takes whatever value the user passed as the type= argument when calling socket(), and stashes it in an instance variable so it can be retrieved later.

Okay, so what's going on with socket.type? Python gained support for SOCK_NONBLOCK and SOCK_CLOEXEC in commit b1c54967381062bccef7db574b8e84f48a0eca76 (bpo-7523), which mainly just exposed the constants, so that users could manually pass these in as part of the type= argument when creating the socket. However, it also made another change: it made it so that that whenever Python toggles the blocking state of the socket, it also toggles the SOCK_NONBLOCK bit in the Python-level variable that represents socket.type.

Logically, this doesn't make any sense. As noted above, SOCK_NONBLOCK is never considered part of the socket type, on any operating system. And the function that is doing this doesn't even have anything to do with SOCK_NONBLOCK; it's actually using ioctl/fcntl, which is a totally unrelated mechanism for toggling blocking-ness. And it only does this on Linux (because only Linux *has* a SOCK_NONBLOCK bit to toggle), so this is kind of just introducing gratuitous brokenness.

The one advantage of this is that it makes these two pieces of code consistent:

   # Version A
   s = socket.socket(type=socket.SOCK_STREAM | socket.SOCK_NONBLOCK)
   # Blindly returns what we passed to type=, even though it's
   # not the actual type:
   s.type

   # Version B
   s = socket.socket()
   s.setblocking(False)
   s.type

This is a weird choice. Basically version A here is revealing a bug in how Python's type tracking works (s.type disagrees with s.getsockopt(SOL_SOCKET, SOL_TYPE)), that is only triggered if the user takes advantage of a rarely-used, non-portable feature. Then version B was written to intentionally introduce the same bug in a much more common case, I guess for consistency. But the bug is still non-portable.

The only time SOCK_CLOEXEC shows up in socket.type is if the user explicitly passed it in their call to socket.socket(), which has been a no-op (except for its effect on socket.type) since PEP 446 made SOCK_CLOEXEC the default.

I think there's a pretty strong argument here that what we really ought to do is *never* show SOCK_NONBLOCK or SOCK_CLOEXEC in in socket.type. That's what Linux itself does, it's the only possibility on every other OS, and it fixes this very annoying issue. (To add to Yury's comments about bugs in asycnio, it took me ages to figure out wtf was going on here and work around it in trio. The current behavior really does cause bugs.) It also sets the stage for fixing bpo-28134 -- right now socket.type's semantics are "whatever you passed in, whether that's actually the socket's type or not", and that's something that's impossible to determine reliably for a bare fd; I'm suggesting they should become "the socket's type", which *is* possible to determine reliably for a bare fd, at least in some cases.
msg308451 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2017-12-16 04:25
Re-reading my above post, I realized I want to emphasize a little more how odd the current relationship is between socket.type and SOCK_CLOEXEC. Right now, the way it works is:

Python *always* adds SOCK_CLOEXEC to the type that it passes to the operating system when creating a socket. You can later toggle this with socket.set_inheritable, and check it with socket.get_inheritable.

You might expect that you could also check it with (socket.type | SOCK_CLOEXEC), and this does tell you *something*... but it has *no relationship at all* to whether the socket has close-on-exec enabled. Instead, what it tells you whether you passed SOCK_CLOEXEC to the socket constructor. Given that passing SOCK_CLOEXEC to the socket constructor is otherwise *completely ignored*, this seems like a weird piece of information to make available.
msg308455 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-16 05:13
Nathaniel, thanks a lot for the comprehensive analysis.

It's now obvious that this weird Linux-specific socket.type quirk is of our own making and specific only to Python.

I've updated the PR:

1. *type* argument of 'socket.socket()' is passed as is to OS functions.  It is now cleared of SOCK_NONBLOCK and SOCK_CLOEXEC on Linux.

2. socket.setblocking() no longer applies SOCK_NONBLOCK to socket.type.

That's it.  I'm now certain that this is the correct way of handling this situation.  socket.type must be fixed.

Please review the updated PR.
msg308589 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-12-18 22:34
This issue is not specific to Linux. FreeBSD is also affected:

>>> s=socket.socket()
>>> s.type
<SocketKind.SOCK_STREAM: 1>
>>> s.setblocking(False)
>>> s.type
536870913

See for example https://sourceware.org/ml/libc-alpha/2013-08/msg00528.html
msg308590 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-18 22:36
> This issue is not specific to Linux. FreeBSD is also affected:

So instead of '#ifdef __linux__' we should use `#ifdef SOCK_NONBLOCK` etc.  I'll update the PR.
msg308591 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-12-18 22:36
The most portable solution seems to getsockopt():

>>> s.getsockopt(socket.SOL_SOCKET, socket.SO_TYPE)
1

But I would prefer to avoid a syscall just to clear flags :-/
msg308592 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-18 22:38
> But I would prefer to avoid a syscall just to clear flags :-/

Yes, that's what I want to do.

We control what constants the socket module exports. Let's just clear them if we export them -- that's a good working solution (and also the most backwards compatible, as opposed to calling sock.getsockopt)
msg308603 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-19 01:03
New changeset 9818142b1bd20243733a953fb8aa2c7be314c47c by Yury Selivanov in branch 'master':
bpo-32331: Fix socket.type when SOCK_NONBLOCK is available (#4877)
https://github.com/python/cpython/commit/9818142b1bd20243733a953fb8aa2c7be314c47c
msg308604 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-19 01:08
I've merged the PR.

Summary of the final change:

1. socket.socket(family, type, proto) constructor clears SOCK_NONBLOCK and SOCK_CLOEXEC from 'type' before assigning it to 'sock.type'.

2. socket.socket(family, SOCK_STREAM | SOCK_NONBLOCK) will still create a non-blocking socket.

3. socket.setblocking() no longer sets/unsets SOCK_NONBLOCK flag on sock.type.

This is 3.7 only change.

Big thanks to Nathaniel and Victor for the help!
msg308605 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-12-19 01:10
> This is 3.7 only change.

In this case, bpo-27456 should be reopened or a new issue should be opened to fix asyncio in Python 3.6, no?
msg308606 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2017-12-19 01:14
> In this case, bpo-27456 should be reopened or a new issue should be opened to fix asyncio in Python 3.6, no?

I'll make a PR to fix it in 3.7.  Code in 3.6 is good.
History
Date User Action Args
2018-05-29 14:07:06yselivanovlinkissue21327 superseder
2017-12-19 01:14:14yselivanovsetmessages: + msg308606
2017-12-19 01:10:26vstinnersetmessages: + msg308605
2017-12-19 01:09:51yselivanovsettitle: Fix socket.type on Linux -> Fix socket.type on OSes with SOCK_NONBLOCK
2017-12-19 01:08:26yselivanovsetstatus: open -> closed
resolution: fixed
messages: + msg308604

stage: patch review -> resolved
2017-12-19 01:03:00yselivanovsetmessages: + msg308603
2017-12-18 22:38:30yselivanovsetmessages: + msg308592
2017-12-18 22:36:38vstinnersetmessages: + msg308591
2017-12-18 22:36:25yselivanovsetmessages: + msg308590
2017-12-18 22:34:53vstinnersetmessages: + msg308589
2017-12-16 05:13:02yselivanovsetmessages: + msg308455
2017-12-16 04:50:42yselivanovsettitle: Add socket.truetype property -> Fix socket.type on Linux
2017-12-16 04:25:52njssetmessages: + msg308451
2017-12-16 04:17:11njssetmessages: + msg308450
2017-12-16 03:46:09yselivanovsetmessages: + msg308446
2017-12-16 03:41:19yselivanovsetmessages: + msg308444
2017-12-16 03:19:31njssetnosy: + njs
messages: + msg308442
2017-12-16 00:48:36yselivanovsetmessages: + msg308437
2017-12-16 00:11:31yselivanovsetmessages: + msg308432
2017-12-16 00:05:21yselivanovsettitle: apply SOCK_TYPE_MASK to socket.type on Linux -> Add socket.truetype property
2017-12-15 16:59:19christian.heimessetmessages: + msg308411
2017-12-15 16:38:52yselivanovsetmessages: + msg308410
2017-12-15 16:32:58yselivanovsetmessages: + msg308409
2017-12-15 10:20:42christian.heimessetnosy: + christian.heimes
messages: + msg308376
2017-12-15 09:56:50inada.naokisetmessages: + msg308371
2017-12-15 09:47:19inada.naokisetmessages: + msg308370
2017-12-15 03:45:18yselivanovsetkeywords: + patch
stage: patch review
pull_requests: + pull_request4771
2017-12-15 03:42:49yselivanovcreate