classification
Title: test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell
Type: crash Stage: resolved
Components: asyncio Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, neologix, python-dev, r.david.murray, vstinner, yselivanov
Priority: normal Keywords:

Created on 2014-07-01 22:40 by r.david.murray, last changed 2014-07-26 21:56 by r.david.murray. This issue is now closed.

Messages (14)
msg222059 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-01 22:40
On one particular linux vserver virtual machine (which is unfortunately my development platform for python), test.test_selectors.PollSelectorTestCase.test_above_fd_setsize fails with the following message:

   zsh: killed

and at that point the test suite stops running, regardless of whether or not I started it with -j.

As far as I can tell, the configuration of this vserver is the same as the one my buildbots run on, but they are on different host machines, so there could be some differences I'm not remembering.  On the buldbots, the test gets skipped with the message 'FD limit reached'.

Anyone have any clues how to debug this?
msg222062 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-02 00:20
The test changes the maximum number of open files. What is the limit in your shell? You can try to modify the test to add print(soft, hard) after getrlimit().

On Fedora 20:

$ python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
(1024, 4096)

The test tries to use the hard limit (4096) to set the soft limit (1024).
msg222075 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-07-02 06:50
There's probably a special mechanism due to vserver which makes the
kernel kill the process instead of failing with EPERM, but it's really
surprising.

What happens if you try the following:
$ python -c "from resource import *; _, hard =
getrlimit(RLIMIT_NOFILE); setrlimit(RLIMIT_NOFILE, (hard, hard))"

You could run the process under strace to see what's going on: you'll
likely just see the reception of a signal though. Maybe "dmesg" would
show interesting logs.
msg222534 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-07 22:55
ping?
msg222951 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-13 16:29
The python command just returns.

The dmesg was a good call:

python invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0
python cpuset=pydev mems_allowed=0
[...]
Out of memory: kill process python(28623:#112) score 85200 or a child
Killed process python(28623:#112) vsz:340800kB, anon-rss:330764kB, file-rss:3864kB

I *thought* I had this virtual server configured with the same resources as I do the buildbots, but I could be wrong.  It's been quite some time since I set both of them up, and I don't even remember how the resources are set at the moment.

Let me know if you want to see the entire dmesg output.
msg223002 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-14 08:36
> Killed process python(28623:#112) vsz:340800kB, anon-rss:330764kB, file-rss:3864kB

340 MB to run test_selectors sounds high.

What is the value of NUM_FDS? And what is the result of this command in your vserver?

$ python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
(1024, 4096)
msg223165 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-16 01:25
rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
(1024L, 1048576L)

Unfortunately the buildbot box is offline at the moment and it may be a bit before I can get it back, so I can't compare the results above with that VM.
msg223181 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-16 08:20
> rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
> (1024L, 1048576L)

Oh, 1 million files is much bigger than 4 thousand files (4096).

The test should only test FD_SETSIZE + 10 files, the problem is to get FD_SETSITE:

        # A scalable implementation should have no problem with more than
        # FD_SETSIZE file descriptors. Since we don't know the value, we just
        # try to set the soft RLIMIT_NOFILE to the hard RLIMIT_NOFILE ceiling.

For example, on my Linux FD_SETSIZE is 1024, whereas the hard limit of RLIMIT_NOFILE is 4096.

/usr/include/linux/posix_types.h:#define __FD_SETSIZE	1024

Maybe we can simply expose the FD_SETSIZE constant in the select module? The constant is useful when you use select.select(), which is still heavily used on Windows.
msg223563 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-07-21 07:08
>> rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))'
>> (1024L, 1048576L)
>
> Oh, 1 million files is much bigger than 4 thousand files (4096).
>
> The test should only test FD_SETSIZE + 10 files, the problem is to get FD_SETSITE:

We could cap it to let's say 2**16, it's larger than any possible
FD_SETSIZE (which are usually low since fd_set are often allocated on
the stack and select() doesn't scale well behind that anyway).

But I don't see anything wrong with the test, it's really the buildbot
setting which is to blame: I expect other tests to fail with such a
low max virtual memory.
msg223571 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-21 10:58
That is the only test that fails for lack of memory.  And it's not the buildbot, it's my development virtual machine.  Having the test suite be killed when I do a full test run is...rather annoying.
msg223573 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-07-21 11:17
Alright, I'll cap the value then (no need to expose FD_SETSIZE).
msg223691 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-07-22 20:30
New changeset 7238c6a05ca6 by Charles-François Natali in branch '3.4':
Issue #21901: Cap the maximum number of file descriptors to use for the test.
http://hg.python.org/cpython/rev/7238c6a05ca6

New changeset 89665cc05592 by Charles-François Natali in branch 'default':
Issue #21901: Cap the maximum number of file descriptors to use for the test.
http://hg.python.org/cpython/rev/89665cc05592
msg223696 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-07-22 20:52
Sorry for the delay, should be fixed now.
msg224088 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-26 21:56
Test passes for me now, thanks.
History
Date User Action Args
2014-07-26 21:56:32r.david.murraysetmessages: + msg224088
2014-07-22 20:52:49neologixsetstatus: open -> closed
resolution: fixed
messages: + msg223696

stage: resolved
2014-07-22 20:30:30python-devsetnosy: + python-dev
messages: + msg223691
2014-07-21 11:17:39neologixsetmessages: + msg223573
2014-07-21 10:58:45r.david.murraysetmessages: + msg223571
2014-07-21 07:08:37neologixsetmessages: + msg223563
2014-07-16 08:20:13vstinnersetmessages: + msg223181
2014-07-16 01:25:53r.david.murraysetmessages: + msg223165
2014-07-14 08:36:32vstinnersetmessages: + msg223002
2014-07-13 16:29:05r.david.murraysetmessages: + msg222951
2014-07-07 22:55:22vstinnersetmessages: + msg222534
2014-07-02 06:50:57neologixsetmessages: + msg222075
2014-07-02 00:20:54vstinnersetmessages: + msg222062
2014-07-01 22:47:45r.david.murraysettitle: test_selectors.PollSelectorTestCase.test_above_fd_setsize killed by shell -> test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell
2014-07-01 22:40:03r.david.murraycreate