Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyOS_AfterFork should reset socketmodule's lock #70108

Closed
ajdavis opened this issue Dec 21, 2015 · 12 comments
Closed

PyOS_AfterFork should reset socketmodule's lock #70108

ajdavis opened this issue Dec 21, 2015 · 12 comments
Labels
3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@ajdavis
Copy link
Contributor

ajdavis commented Dec 21, 2015

BPO 25920
Nosy @terryjreedy, @ronaldoussoren, @vstinner, @serhiy-storchaka, @1st1, @ajdavis
PRs
  • bpo-25920: Remove socket.getaddrinfo() lock on macOS #20177
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-05-28.15:37:32.925>
    created_at = <Date 2015-12-21.23:19:32.190>
    labels = ['interpreter-core', 'type-bug', '3.10']
    title = "PyOS_AfterFork should reset socketmodule's lock"
    updated_at = <Date 2020-05-28.15:37:32.924>
    user = 'https://github.com/ajdavis'

    bugs.python.org fields:

    activity = <Date 2020-05-28.15:37:32.924>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-05-28.15:37:32.925>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2015-12-21.23:19:32.190>
    creator = 'emptysquare'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 25920
    keywords = ['patch']
    message_count = 12.0
    messages = ['256815', '256817', '256920', '368876', '368881', '368883', '369031', '369037', '369116', '369220', '370227', '370229']
    nosy_count = 8.0
    nosy_names = ['terry.reedy', 'ronaldoussoren', 'vstinner', 'ionelmc', 'serhiy.storchaka', 'yselivanov', 'emptysquare', 'hugh']
    pr_nums = ['20177']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue25920'
    versions = ['Python 3.10']

    @ajdavis
    Copy link
    Contributor Author

    ajdavis commented Dec 21, 2015

    On some platforms there's an exclusive lock in socketmodule, used for getaddrinfo, gethostbyname, gethostbyaddr. A thread can hold this lock while another forks, leaving it locked forever in the child process. Calls to these functions in the child process will hang.

    (I wrote some more details here: https://emptysqua.re/blog/getaddrinfo-deadlock/ )

    I propose that this is a bug, and that it can be fixed in PyOS_AfterFork, where a few similar locks are already reset.

    @1st1 1st1 added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Dec 21, 2015
    @1st1
    Copy link
    Member

    1st1 commented Dec 21, 2015

    Maybe instead of releasing the lock in the forked child process, we should try to acquire the lock in the os.fork() implementation, and then release it?

    Otherwise, suppose that a call to getaddrinfo (call #1) takes a long amount of time. In the middle of it we fork, and then immediately try to call getaddrinfo (call #2, and call #1 is still happening) for some other address. At this point, since getaddrinfo isn't threadsafe, something bad will happen.

    @ronaldoussoren
    Copy link
    Contributor

    bpo-25924 is related to this, I filed this after reading the blog post. The lock might not be necessary on OSX, and possibly on the other systems as well.

    Yury: resetting the lock in the child should be safe because after the fork the child only has a single thread that is returning from fork(2). The thread that acquired the lock does not exist in the child process.

    @terryjreedy
    Copy link
    Member

    Does the example code (which should be posted here) still hang?

    If so, automated tests that hang indefinitely on failure are a nuisance. A revised example that failed after, say, a second would be better.

    @terryjreedy terryjreedy added the 3.9 only security fixes label May 14, 2020
    @vstinner
    Copy link
    Member

    Maybe instead of releasing the lock in the forked child process, we should try to acquire the lock in the os.fork() implementation, and then release it?

    In bpo-40089, I added _PyThread_at_fork_reinit() for this purpose: reinitialize a lock after a fork to unlocked state. Internally, it leaks memory on purpose and then create a new lock, since there is no portable way to reset a lock after fork.

    The problem is how to register netdb_lock of Modules/socketmodule.c into a list of locks which should be reinitialized at fork, or maybe how to register a C callback called at fork. There is a *Python* API to register a callback after a fork: os.register_at_fork().

    See also the meta-issue bpo-6721: "Locks in the standard library should be sanitized on fork".

    @vstinner
    Copy link
    Member

    (I wrote some more details here: https://emptysqua.re/blog/getaddrinfo-deadlock/ )

    On macOS, Python is only affected if "MAC_OS_X_VERSION_MIN_REQUIRED < MAC_OS_X_VERSION_10_5". Is it still the case in 2020?

    Copy/paste of socketmodule.c:

    /* On systems on which getaddrinfo() is believed to not be thread-safe,
    (this includes the getaddrinfo emulation) protect access with a lock.

    getaddrinfo is thread-safe on Mac OS X 10.5 and later. Originally it was
    a mix of code including an unsafe implementation from an old BSD's
    libresolv. In 10.5 Apple reimplemented it as a safe IPC call to the
    mDNSResponder process. 10.5 is the first be UNIX '03 certified, which
    includes the requirement that getaddrinfo be thread-safe. See issue bpo-25924.

    It's thread-safe in OpenBSD starting with 5.4, released Nov 2013:
    http://www.openbsd.org/plus54.html

    It's thread-safe in NetBSD starting with 4.0, released Dec 2007:

    http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/net/getaddrinfo.c.diff?r1=1.82&r2=1.83
     */
    #if ((defined(__APPLE__) && \
            MAC_OS_X_VERSION_MIN_REQUIRED < MAC_OS_X_VERSION_10_5) || \
        (defined(__FreeBSD__) && __FreeBSD_version+0 < 503000) || \
        (defined(__OpenBSD__) && OpenBSD+0 < 201311) || \
        (defined(__NetBSD__) && __NetBSD_Version__+0 < 400000000) || \
        !defined(HAVE_GETADDRINFO))
    #define USE_GETADDRINFO_LOCK
    #endif

    @ronaldoussoren
    Copy link
    Contributor

    The macOS test checks if the binary targets macOS 10.4 or earlier. Those versions of macOS have been out of support for a very long time, and we haven't had installers targeting those versions of macOS for a long time as well. 2.7 and 3.5 had installers targeting macOS 10.5, current installers target macOS 10.9.

    IMHO macOS 10.4 has moved into museum territory and I wouldn't bother supporting it anymore.

    Support for USE_GETADDRINFO_LOCK is only enabled for very old OS releases, the OS that stopped requiring this the latest is OpenBSD in 2013 (7 years ago). The other OSes stopped requiring this in code in 2007 (13 years ago).

    I'd drop this code instead of fixing it.

    @vstinner
    Copy link
    Member

    I'd drop this code instead of fixing it.

    Hum, FreeBSD, OpenBSD and NetBSD versions which require the fix also look very old. So I agree that it became safe to remove the fix.

    Would it make sense to only fix it on Python 3.10 and leave other versions with the bug? Or should fix all Python versions?

    @ronaldoussoren
    Copy link
    Contributor

    Technically this would be a functional change, I'd drop this code in 3.9 and trunk (although it is awfully close to the expected date for 3.9b1).

    Older versions would keep this code and the bug, that way the older python versions can still be used on these ancient OS versions (but users might run into this race condition).

    @vstinner
    Copy link
    Member

    I wrote PR 20177 to avoid the netdb_lock in socket.getaddrinfo(), but the lock is still used on platforms which don't provide gethostbyname_r():

    #if !defined(HAVE_GETHOSTBYNAME_R) && !defined(MS_WINDOWS)
    # define USE_GETHOSTBYNAME_LOCK
    #endif

    @vstinner
    Copy link
    Member

    New changeset 0de437d by Victor Stinner in branch 'master':
    bpo-25920: Remove socket.getaddrinfo() lock on macOS (GH-20177)
    0de437d

    @vstinner
    Copy link
    Member

    If I understood correctly, Python 3.8 and 3.9 binaries provided by python.org is *not* impacted by this issue.

    Only Python binaries built manually with explicit support for macOS 10.4 ("MAC_OS_X_VERSION_MIN_REQUIRED") were impacted.

    Python 3.9 and older are not fixed (keep the lock). The workaround is to require macOS 10.5 or newer. macOS 10.4 was released in 2004, it's maybe time to stop support it :-)

    Python 3.7 (and newer) requires macOS 10.6 or newer (again, I'm talking about binaries provided by python.org).

    bpo-25920: Remove socket.getaddrinfo() lock on macOS (GH-20177)

    I chose to leave the lock for gethostbyname(). Ronald wrote that this lock is no longer needed:
    "As an aside (not to be addressed in the PR): Apparently gethostbyname() and related functions are thread-safe on macOS. This is according to the manpage on macOS 10.15. I haven't checked in which version that changed. This allows avoiding the use of the gethostbyname lock as well."
    #20177 (review)

    Please open a separated issue for this lock.

    @vstinner vstinner added 3.10 only security fixes and removed 3.9 only security fixes labels May 28, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants