This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author nirs
Recipients ZackerySpytz, brett.cannon, eric.snow, josh.r, ncoghlan, nirs, pitrou, pmpp, serhiy.storchaka, twouters, vstinner, yselivanov
Date 2018-03-16.23:57:41
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1521244661.85.0.467229070634.issue33021@psf.upfronthosting.co.za>
In-reply-to
Content
Antoine, thanks for fixing this on master! but I don't think this issue
can be closed yet.

First, the issue is not a performance but reliability. I probably made 
bad choice when I marked this as performance.

When you call mmap.mmap() in one thread, the entire process hangs for
an hour because the file descriptor is on a non-responsive NFS server.

With the fix, only the thread accessing the file descriptor is affected.
The rest of the system can function normally.

Second, the issue affects python 2.7, which is the production version on
many servers, and will be for many years e.g. on RHEL/CentOS 7. I think
it is important to fix this issue for these users.

Here is examples of the issue using reproducer scripts I uploaded to the
bug.

When mmap.mmap block, the entire process hangs. I unblocked the process from 
another shell by removing the iptables rule.

# python bpo-33021/mmap_nfs_test.py mnt dumbo.tlv.redhat.com
2018-03-17 01:17:57,846 - (MainThread) - Starting canary thread
2018-03-17 01:17:57,846 - (Canary) - Blocking access to storage
2018-03-17 01:17:57,857 - (Canary) - If this test is hang, please run: iptables -D OUTPUT -p tcp -d dumbo.tlv.redhat.com --dport 2049 -j DROP
2018-03-17 01:17:57,857 - (Canary) - check 0
2018-03-17 01:17:58,858 - (Canary) - check 1
2018-03-17 01:17:59,858 - (Canary) - check 2
2018-03-17 01:18:00,859 - (Canary) - check 3
2018-03-17 01:18:01,859 - (Canary) - check 4
2018-03-17 01:18:02,859 - (Canary) - check 5
2018-03-17 01:18:03,860 - (Canary) - check 6
2018-03-17 01:18:04,860 - (Canary) - check 7
2018-03-17 01:18:05,861 - (Canary) - check 8
2018-03-17 01:18:06,861 - (Canary) - check 9
2018-03-17 01:18:07,862 - (Canary) - check 10
2018-03-17 01:18:07,868 - (MainThread) - Calling mmap.mmap

(I remove the iptables rule here)

2018-03-17 01:18:57,683 - (MainThread) - OK
2018-03-17 01:18:57,683 - (MainThread) - Done
2018-03-17 01:18:57,683 - (Canary) - check 11

When mmapobject.size() was called, the entire process was hang. I unblocked the
process from another shell by removing the iptables rule.

# python bpo-33021/mmap_size_nfs_test.py mnt dumbo.tlv.redhat.com
2018-03-17 01:22:17,991 - (MainThread) - Starting canary thread
2018-03-17 01:22:17,992 - (Canary) - Blocking access to storage
2018-03-17 01:22:18,001 - (Canary) - If this test is hang, please run: iptables -D OUTPUT -p tcp -d dumbo.tlv.redhat.com --dport 2049 -j DROP
2018-03-17 01:22:18,001 - (Canary) - check 0
2018-03-17 01:22:19,002 - (Canary) - check 1
2018-03-17 01:22:20,002 - (Canary) - check 2
2018-03-17 01:22:21,002 - (Canary) - check 3
2018-03-17 01:22:22,003 - (Canary) - check 4
2018-03-17 01:22:23,003 - (Canary) - check 5
2018-03-17 01:22:24,004 - (Canary) - check 6
2018-03-17 01:22:25,004 - (Canary) - check 7
2018-03-17 01:22:26,004 - (Canary) - check 8
2018-03-17 01:22:27,005 - (Canary) - check 9
2018-03-17 01:22:28,005 - (MainThread) - Calling mmapobject.size

(I removed the ipatables rule here)

2018-03-17 01:23:38,701 - (MainThread) - OK
2018-03-17 01:23:38,701 - (MainThread) - Done
2018-03-17 01:23:38,701 - (Canary) - check 10

I found that os.fdopen issue does not affect RHEL/CentOS 7, because they
use python 2.7.5, and the issue was introduced in python 2.7.7, in:

commit 5c863bf93809cefeb4469512eadac291b7046051
Author: Benjamin Peterson <benjamin@python.org>
Date:   Mon Apr 14 19:45:46 2014 -0400

    when an exception is raised in fdopen, never close the fd (changing on my mind on #21191)

This issue affects Fedora (python 2.7.14) and probably other distros using
latest python 2.7.

Here is example run show how this affects Fedora 27:

# python fdopen_nfs_test.py mnt dumbo.tlv.redhat.com
2018-03-17 01:43:52,718 - (MainThread) - Starting canary thread
2018-03-17 01:43:52,718 - (Canary) - Blocking access to storage
2018-03-17 01:43:52,823 - (Canary) - If this test is hang, please run: iptables -D OUTPUT -p tcp -d dumbo.tlv.redhat.com --dport 2049 -j DROP
2018-03-17 01:43:52,824 - (Canary) - check 0
2018-03-17 01:43:53,824 - (Canary) - check 1
2018-03-17 01:43:54,824 - (Canary) - check 2
2018-03-17 01:43:55,825 - (Canary) - check 3
2018-03-17 01:43:56,825 - (Canary) - check 4
2018-03-17 01:43:57,825 - (Canary) - check 5
2018-03-17 01:43:58,826 - (Canary) - check 6
2018-03-17 01:43:59,826 - (Canary) - check 7
2018-03-17 01:44:00,826 - (Canary) - check 8
2018-03-17 01:44:01,827 - (Canary) - check 9
2018-03-17 01:44:02,827 - (Canary) - check 10
2018-03-17 01:44:02,834 - (MainThread) - Calling os.fdopen

(remove iptbales rule, and force-unmount here) 

2018-03-17 01:50:25,853 - (MainThread) - OK
2018-03-17 01:50:25,854 - (Canary) - check 11
2018-03-17 01:50:25,895 - (MainThread) - Done
Traceback (most recent call last):
  File "fdopen_nfs_test.py", line 75, in <module>
    os.unlink(filename)
OSError: [Errno 2] No such file or directory: 'mnt/test'


So, I think we should:
- backport to 3.7, 3.6
- reconsider backport to 2.7, at least for mmap and os.fdopen.

I can prepare the backports and split the 2.7 patch if this helps.
History
Date User Action Args
2018-03-16 23:57:41nirssetrecipients: + nirs, twouters, brett.cannon, ncoghlan, pitrou, vstinner, pmpp, eric.snow, serhiy.storchaka, yselivanov, josh.r, ZackerySpytz
2018-03-16 23:57:41nirssetmessageid: <1521244661.85.0.467229070634.issue33021@psf.upfronthosting.co.za>
2018-03-16 23:57:41nirslinkissue33021 messages
2018-03-16 23:57:41nirscreate