classification
Title: PyOS_AfterFork should reset socketmodule's lock
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: emptysquare, hugh, ionelmc, ronaldoussoren, serhiy.storchaka, terry.reedy, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2015-12-21 23:19 by emptysquare, last changed 2020-05-28 15:37 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 20177 merged vstinner, 2020-05-18 13:40
Messages (12)
msg256815 - (view) Author: A. Jesse Jiryu Davis (emptysquare) * Date: 2015-12-21 23:19
On some platforms there's an exclusive lock in socketmodule, used for getaddrinfo, gethostbyname, gethostbyaddr. A thread can hold this lock while another forks, leaving it locked forever in the child process. Calls to these functions in the child process will hang.

(I wrote some more details here: https://emptysqua.re/blog/getaddrinfo-deadlock/ )

I propose that this is a bug, and that it can be fixed in PyOS_AfterFork, where a few similar locks are already reset.
msg256817 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2015-12-21 23:55
Maybe instead of releasing the lock in the forked child process, we should try to acquire the lock in the os.fork() implementation, and then release it?

Otherwise, suppose that a call to getaddrinfo (call #1) takes a long amount of time.  In the middle of it we fork, and then immediately try to call getaddrinfo (call #2, and call #1 is still happening) for some other address. At this point, since getaddrinfo isn't threadsafe, something bad will happen.
msg256920 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-12-23 15:47
#25924 is related to this, I filed this after reading the blog post. The lock might not be necessary on OSX, and possibly on the other systems as well.


Yury: resetting the lock in the child should be safe because after the fork the child only has a single thread that is returning from fork(2). The thread that acquired the lock does not exist in the child process.
msg368876 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-05-14 23:14
Does the example code (which should be posted here) still hang?

If so, automated tests that hang indefinitely on failure are a nuisance.  A revised example that failed after, say, a second would be better.
msg368881 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-14 23:42
> Maybe instead of releasing the lock in the forked child process, we should try to acquire the lock in the os.fork() implementation, and then release it?

In bpo-40089, I added _PyThread_at_fork_reinit() for this purpose: reinitialize a lock after a fork to unlocked state. Internally, it leaks memory on purpose and then create a new lock, since there is no portable way to reset a lock after fork.

The problem is how to register netdb_lock of Modules/socketmodule.c into a list of locks which should be reinitialized at fork, or maybe how to register a C callback called at fork. There is a *Python* API to register a callback after a fork: os.register_at_fork().

See also the meta-issue bpo-6721: "Locks in the standard library should be sanitized on fork".
msg368883 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-14 23:48
> (I wrote some more details here: https://emptysqua.re/blog/getaddrinfo-deadlock/ )

On macOS, Python is only affected if "MAC_OS_X_VERSION_MIN_REQUIRED < MAC_OS_X_VERSION_10_5". Is it still the case in 2020?

Copy/paste of socketmodule.c:

/* On systems on which getaddrinfo() is believed to not be thread-safe,
   (this includes the getaddrinfo emulation) protect access with a lock.

   getaddrinfo is thread-safe on Mac OS X 10.5 and later. Originally it was
   a mix of code including an unsafe implementation from an old BSD's
   libresolv. In 10.5 Apple reimplemented it as a safe IPC call to the
   mDNSResponder process. 10.5 is the first be UNIX '03 certified, which
   includes the requirement that getaddrinfo be thread-safe. See issue #25924.

   It's thread-safe in OpenBSD starting with 5.4, released Nov 2013:
   http://www.openbsd.org/plus54.html

   It's thread-safe in NetBSD starting with 4.0, released Dec 2007:

http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/net/getaddrinfo.c.diff?r1=1.82&r2=1.83
 */
#if ((defined(__APPLE__) && \
        MAC_OS_X_VERSION_MIN_REQUIRED < MAC_OS_X_VERSION_10_5) || \
    (defined(__FreeBSD__) && __FreeBSD_version+0 < 503000) || \
    (defined(__OpenBSD__) && OpenBSD+0 < 201311) || \
    (defined(__NetBSD__) && __NetBSD_Version__+0 < 400000000) || \
    !defined(HAVE_GETADDRINFO))
#define USE_GETADDRINFO_LOCK
#endif
msg369031 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-05-16 09:56
The macOS test checks if the binary targets macOS 10.4 or earlier.  Those versions of macOS have been out of support for a very long time, and we haven't had installers targeting those versions of macOS for a long time as well.  2.7 and 3.5 had installers targeting macOS 10.5, current installers target macOS 10.9. 

IMHO macOS 10.4 has moved into museum territory and I wouldn't bother supporting it anymore.

Support for USE_GETADDRINFO_LOCK is only enabled for very old OS releases, the OS that stopped requiring this the latest is OpenBSD in 2013 (7 years ago). The other OSes stopped requiring this in code in 2007 (13 years ago).

I'd drop this code instead of fixing it.
msg369037 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-16 10:15
> I'd drop this code instead of fixing it.

Hum, FreeBSD, OpenBSD and NetBSD versions which require the fix also look very old. So I agree that it became safe to remove the fix.

Would it make sense to only fix it on Python 3.10 and leave other versions with the bug? Or should fix all Python versions?
msg369116 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-05-17 12:18
Technically this would be a functional change, I'd drop this code in  3.9 and trunk (although it is awfully close to the expected date for 3.9b1). 

Older versions would keep this code and the bug, that way the older python versions can still be used on these ancient OS versions (but users might run into this race condition).
msg369220 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-18 13:46
I wrote PR 20177 to avoid the netdb_lock in socket.getaddrinfo(), but the lock is still used on platforms which don't provide gethostbyname_r():

#if !defined(HAVE_GETHOSTBYNAME_R) && !defined(MS_WINDOWS)
# define USE_GETHOSTBYNAME_LOCK
#endif
msg370227 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-28 15:23
New changeset 0de437de6210c2b32b09d6c47a805b23d023bd59 by Victor Stinner in branch 'master':
bpo-25920: Remove socket.getaddrinfo() lock on macOS (GH-20177)
https://github.com/python/cpython/commit/0de437de6210c2b32b09d6c47a805b23d023bd59
msg370229 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-28 15:37
If I understood correctly, Python 3.8 and 3.9 binaries provided by python.org is *not* impacted by this issue.

Only Python binaries built manually with explicit support for macOS 10.4 ("MAC_OS_X_VERSION_MIN_REQUIRED") were impacted.

Python 3.9 and older are not fixed (keep the lock). The workaround is to require macOS 10.5 or newer. macOS 10.4 was released in 2004, it's maybe time to stop support it :-)

Python 3.7 (and newer) requires macOS 10.6 or newer (again, I'm talking about binaries provided by python.org).


> bpo-25920: Remove socket.getaddrinfo() lock on macOS (GH-20177)

I chose to leave the lock for gethostbyname(). Ronald wrote that this lock is no longer needed:
"As an aside (not to be addressed in the PR): Apparently gethostbyname() and related functions are thread-safe on macOS. This is according to the manpage on macOS 10.15. I haven't checked in which version that changed. This allows avoiding the use of the gethostbyname lock as well."
https://github.com/python/cpython/pull/20177#pullrequestreview-418909595

Please open a separated issue for this lock.
History
Date User Action Args
2020-05-28 15:37:32vstinnersetstatus: open -> closed
versions: + Python 3.10, - Python 3.9
messages: + msg370229

resolution: fixed
stage: patch review -> resolved
2020-05-28 15:23:47vstinnersetmessages: + msg370227
2020-05-18 13:46:37vstinnersetmessages: + msg369220
2020-05-18 13:40:27vstinnersetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request19476
2020-05-17 12:18:46ronaldoussorensetmessages: + msg369116
2020-05-16 10:15:41vstinnersetmessages: + msg369037
2020-05-16 09:56:51ronaldoussorensetmessages: + msg369031
2020-05-15 14:20:18hughsetnosy: + hugh
2020-05-14 23:48:02vstinnersetmessages: + msg368883
2020-05-14 23:42:29vstinnersetmessages: + msg368881
2020-05-14 23:14:40terry.reedysetnosy: + terry.reedy

messages: + msg368876
versions: + Python 3.9, - Python 3.4, Python 3.5, Python 3.6
2015-12-23 15:47:29ronaldoussorensetnosy: + ronaldoussoren
messages: + msg256920
2015-12-21 23:57:17yselivanovsetnosy: + serhiy.storchaka
2015-12-21 23:55:27yselivanovsetmessages: + msg256817
2015-12-21 23:36:24yselivanovsetversions: + Python 3.4, Python 3.5, Python 3.6
nosy: + vstinner, yselivanov

components: + Interpreter Core
type: behavior
stage: needs patch
2015-12-21 23:29:43ionelmcsetnosy: + ionelmc
2015-12-21 23:19:32emptysquarecreate