New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some fstat() calls do not release the GIL, possibly hanging all threads #77202
Comments
If the file descriptor is on a non-responsive NFS server, calling In python 3, the calls are handled now by _py_fstat(), releasing the GIL
In python there are more fstat() calls to fix, affecting users of:
|
is fstat thread safe ? |
fstat is async signal safe, and I suspect it's thread safe in general, though usage does rely on the file descriptor remaining valid. If the fd in question is closed in another thread after the GIL is released, fstat would fail; if a new file is opened and assigned the same fd, it would give erroneous results. But I don't think any of that would lead to fundamentally incorrect behavior (after all, you could do the same thing manually by caching an fd then closing the file). |
josh.r. i think you are right, i was worried if a nfs sillyrename is in progress, for eg a lock file ,then server hangs but thread lift the GIL and allow another thread to try to start to fstat the same path. |
Python cannot protect raw file descriptor from bad multi-threaded This issue affects all function using raw file descriptors, and we cannot Even if fstat was not thread safe, we cannot protect it using the GIl |
Commit 4484f9d causes GCC 7.2.0 to emit a warning. cpython/Modules/mmapmodule.c: In function ‘new_mmap_object’:
cpython/Modules/mmapmodule.c:1126:18: warning: ‘fstat_result’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (fd != -1 && fstat_result == 0 && S_ISREG(status.st_mode)) { A PR is attached. |
With Benjamin we've decided that this wouldn't happen in 2.7. Performance improvements don't warrant a backport. Thank you Nir for reporting and posting the patches! |
Antoine, thanks for fixing this on master! but I don't think this issue First, the issue is not a performance but reliability. I probably made When you call mmap.mmap() in one thread, the entire process hangs for With the fix, only the thread accessing the file descriptor is affected. Second, the issue affects python 2.7, which is the production version on Here is examples of the issue using reproducer scripts I uploaded to the When mmap.mmap block, the entire process hangs. I unblocked the process from # python bpo-33021/mmap_nfs_test.py mnt dumbo.tlv.redhat.com (I remove the iptables rule here) 2018-03-17 01:18:57,683 - (MainThread) - OK When mmapobject.size() was called, the entire process was hang. I unblocked the # python bpo-33021/mmap_size_nfs_test.py mnt dumbo.tlv.redhat.com (I removed the ipatables rule here) 2018-03-17 01:23:38,701 - (MainThread) - OK I found that os.fdopen issue does not affect RHEL/CentOS 7, because they commit 5c863bf
This issue affects Fedora (python 2.7.14) and probably other distros using Here is example run show how this affects Fedora 27: # python fdopen_nfs_test.py mnt dumbo.tlv.redhat.com (remove iptbales rule, and force-unmount here) 2018-03-17 01:50:25,853 - (MainThread) - OK
2018-03-17 01:50:25,854 - (Canary) - check 11
2018-03-17 01:50:25,895 - (MainThread) - Done
Traceback (most recent call last):
File "fdopen_nfs_test.py", line 75, in <module>
os.unlink(filename)
OSError: [Errno 2] No such file or directory: 'mnt/test' So, I think we should:
I can prepare the backports and split the 2.7 patch if this helps. |
Attaching reproducer for mmapobject.size() |
Attaching reproducer for os.fdopen() |
Regarding 2.7: if folks want this fixed on RHEL/CentOS, then they need to talk to Red Hat about it, not the upstream CPython devs. I used to work there, and was told multiple times by Red Hat executives that none of their customers actually used Python, so the Python ecosystem wasn't of any strategic interest to them, and hence zero funding was available for full-time upstream CPython maintenance econtributions. As such, I no longer consider it acceptable for anyone to ask community volunteers to compensate for Red Hat's abject negligence and executive incompetence. |
Re: backporting this. I think backporting to 3.6/3.7 makes a lot of sense - bugfix and prerelease respectively. For 2.7, this bug has been around for ages, the patch is small, and I have no objection - but the RM has already said no, so I guess not happening :). That said, if I was analyzing this from scratch I'd look at the pypi download stats to assess what footprint 2.7 still has, and whether taking this fix there would make objective sense. While it is a genuine bug - and a nice catch - I have to say that running any Python with mmapped data off of NFS is a supremely bad idea :). |
Reopening for 3.x backports, as per Robert's advice. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: