classification
Title: Large Involuntary context switches during oom-killer
Type: resource usage Stage:
Components: Versions: Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: DamianMyerscough, pitrou, rbcollins, vstinner
Priority: normal Keywords:

Created on 2016-03-14 21:54 by DamianMyerscough, last changed 2016-03-15 23:53 by DamianMyerscough. This issue is now closed.

Messages (7)
msg261781 - (view) Author: Damian Myerscough (DamianMyerscough) Date: 2016-03-14 21:54
I have been running a simple script inside a Docker container to cause the container to trigger oom-killer.

>>> mem = {}
>>> for i in range(65535):
...     mem[i] = "A" * 65535

When oom-killer is trigger I see a large number of involuntary context switches and major page faults which causes a spike in CPU
and disk.


# /usr/bin/time --verbose python
Python 2.7.9 (default, Mar  1 2015, 12:57:24) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> mem = {}
>>> for i in range(65535):
...     mem[i] = "A" * 65535
... 
Command terminated by signal 9
	Command being timed: "python"
	User time (seconds): 0.13
	System time (seconds): 3.83
	Percent of CPU this job got: 17%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:22.94
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2096516
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 33536
	Minor (reclaiming a frame) page faults: 524545
	Voluntary context switches: 30706
	Involuntary context switches: 137852
	Swaps: 0
	File system inputs: 8499072
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0


I tried the same test using NodeJS/C++ and I could see a lot less involuntary context switches and major page faults which indicates this could
be a Python issue.

# /usr/bin/time --verbose ./alloc_forever 1024 5 524288000
initial_alloc_amount: 1024, sleep_duration: 5, alloc_amount: 524288000
memory: 501.07MB
memory: 1001.29MB
memory: 1501.19MB
memory: 2001.09MB
Command terminated by signal 9
	Command being timed: "./alloc_forever 1024 5 524288000"
	User time (seconds): 0.63
	System time (seconds): 0.97
	Percent of CPU this job got: 7%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:21.61
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2096536
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 11
	Minor (reclaiming a frame) page faults: 524178
	Voluntary context switches: 17
	Involuntary context switches: 284
	Swaps: 0
	File system inputs: 336
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
msg261782 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-14 22:00
I don't understand this issue... I don't see how Python can avoid context switches when writing data to memory...
msg261783 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2016-03-14 22:21
So this test script is making a 65K entry dict, and each item is a new, separate 65K string. The strings are allocated in single chunks, so we should expect  couple hundred reference count writes total.

AIUI involuntary context switches occur when there is CPU contention. What else do you have running? 

Notable Python is taking 3.83 seconds of system time, much more than your other tests. You may get some insight from strace -c.
msg261795 - (view) Author: Damian Myerscough (DamianMyerscough) Date: 2016-03-15 02:32
@rbcollins there is nothing else running in the Docker container other than the python interpreter. I did an `strace` of the process

mmap(NULL, 3149824, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7effc2d0c000
munmap(0x7effc300d000, 790528)          = 0


@haypo, this only with Python when it is killed by oom-killer. NodeJS/C++ dont show nearly as many involuntary context switches or page faults. I logged the bug to see if anyone else has encountered this.
msg261806 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2016-03-15 09:55
Why would it be a Python issue? By construction I doubt the OOM killer lets the process run anything before force-killing it.
msg261809 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-15 11:31
> When oom-killer is trigger

IMHO this is your root problem.

> When oom-killer is trigger I see a large number of involuntary context switches ...

The principle of *involuntarty* context switches is that the application is not responsible for them.

> I tried the same test using NodeJS/C++ and I could see a lot less involuntary context switches and major page faults which indicates this could be a Python issue.

Your rationale cannot be good.

It is not a Python bug.

OOM Killer is not a bug but a feature to limit abuse of resources on Linux.

But yeah, in my experience, Linux becomes crazy when it reachs OOM Killer. You can try to tune your kernel, but it's much better to always avoid using all memory ;-)
msg261836 - (view) Author: Damian Myerscough (DamianMyerscough) Date: 2016-03-15 23:53
Thanks for the feedback, I will continue to dig into this. 

I know processes go crazy sometimes when OOM killer kicks off, I just wanted to rule out a Python bug or if anyone in the community has seen this before.

Thanks
History
Date User Action Args
2016-03-15 23:53:24DamianMyerscoughsetmessages: + msg261836
2016-03-15 11:31:26vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg261809
2016-03-15 09:55:52pitrousetnosy: + pitrou
messages: + msg261806
2016-03-15 02:32:21DamianMyerscoughsetmessages: + msg261795
2016-03-14 22:21:15rbcollinssetnosy: + rbcollins
messages: + msg261783
2016-03-14 22:00:54vstinnersetnosy: + vstinner
messages: + msg261782
2016-03-14 21:54:12DamianMyerscoughcreate