Issue35988
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2019-02-13 14:46 by hinxx, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (9) | |||
---|---|---|---|
msg335447 - (view) | Author: Hinko Kocevar (hinxx) | Date: 2019-02-13 14:46 | |
I'm running a tornado server with websockets client. Every now and then the python3.5 crashes, seg faults. I added code tracking (https://stackoverflow.com/questions/2663841/python-tracing-a-segmentation-fault) and this is what I see: line, /usr/lib64/python3.5/asyncio/events.py:675 call, /usr/lib64/python3.5/asyncio/events.py:621 line, /usr/lib64/python3.5/asyncio/events.py:627 line, /usr/lib64/python3.5/asyncio/events.py:628 line, /usr/lib64/python3.5/asyncio/events.py:629 return, /usr/lib64/python3.5/asyncio/events.py:629 line, /usr/lib64/python3.5/asyncio/events.py:676 line, /usr/lib64/python3.5/asyncio/events.py:677 return, /usr/lib64/python3.5/asyncio/events.py:677 line, /usr/lib64/python3.5/asyncio/futures.py:172 line, /usr/lib64/python3.5/asyncio/futures.py:173 call, /usr/lib64/python3.5/asyncio/base_events.py:1461 line, /usr/lib64/python3.5/asyncio/base_events.py:1462 return, /usr/lib64/python3.5/asyncio/base_events.py:1462 return, /usr/lib64/python3.5/asyncio/futures.py:173 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:584 call, /usr/lib64/python3.5/asyncio/futures.py:315 line, /usr/lib64/python3.5/asyncio/futures.py:322 line, /usr/lib64/python3.5/asyncio/futures.py:325 return, /usr/lib64/python3.5/asyncio/futures.py:325 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:585 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:586 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:587 call, /usr/lib64/python3.5/site-packages/tornado/iostream.py:1045 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:1046 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:1047 call, /usr/lib64/python3.5/site-packages/tornado/iostream.py:134 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:135 return, /usr/lib64/python3.5/site-packages/tornado/iostream.py:135 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:1048 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:1050 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:1051 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:1052 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:1060 call, /usr/lib64/python3.5/site-packages/tornado/iostream.py:163 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:168 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:169 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:170 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:174 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:175 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:176 return, /usr/lib64/python3.5/site-packages/tornado/iostream.py:176 call, /usr/lib64/python3.5/site-packages/tornado/iostream.py:1247 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:1248 line, /usr/lib64/python3.5/site-packages/tornado/iostream.py:1249 Program received signal SIGPIPE, Broken pipe. 0x00007ffff76f7bfb in __libc_send (fd=14, buf=0x118a0b0, n=2332, flags=0) at ../sysdeps/unix/sysv/linux/x86_64/send.c:31 31 ssize_t result = INLINE_SYSCALL (sendto, 6, fd, buf, n, flags, NULL, Missing separate debuginfos, use: debuginfo-install python35u-3.5.6-1.ius.centos7.x86_64 (gdb) (gdb) (gdb) (gdb) bt #0 0x00007ffff76f7bfb in __libc_send (fd=14, buf=0x118a0b0, n=2332, flags=0) at ../sysdeps/unix/sysv/linux/x86_64/send.c:31 #1 0x00007fffed7c6c66 in sock_send_impl () from /usr/lib64/python3.5/lib-dynload/_socket.cpython-35m-x86_64-linux-gnu.so #2 0x00007fffed7c9e06 in sock_call_ex () from /usr/lib64/python3.5/lib-dynload/_socket.cpython-35m-x86_64-linux-gnu.so #3 0x00007fffed7ca79f in sock_send () from /usr/lib64/python3.5/lib-dynload/_socket.cpython-35m-x86_64-linux-gnu.so #4 0x00007ffff79b60d9 in PyCFunction_Call () from /lib64/libpython3.5m.so.1.0 #5 0x00007ffff7a2f646 in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #6 0x00007ffff7a2ea88 in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #7 0x00007ffff7a2ea88 in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #8 0x00007ffff7a30ebc in _PyEval_EvalCodeWithName () from /lib64/libpython3.5m.so.1.0 #9 0x00007ffff7a2e22f in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #10 0x00007ffff7a30ebc in _PyEval_EvalCodeWithName () from /lib64/libpython3.5m.so.1.0 #11 0x00007ffff7a2e22f in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #12 0x00007ffff7a30ebc in _PyEval_EvalCodeWithName () from /lib64/libpython3.5m.so.1.0 #13 0x00007ffff7a2e22f in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #14 0x00007ffff7a30ebc in _PyEval_EvalCodeWithName () from /lib64/libpython3.5m.so.1.0 #15 0x00007ffff7a2e22f in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #16 0x00007ffff79924c0 in gen_send_ex () from /lib64/libpython3.5m.so.1.0 #17 0x00007ffff7a2ec28 in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #18 0x00007ffff7a30ebc in _PyEval_EvalCodeWithName () from /lib64/libpython3.5m.so.1.0 #19 0x00007ffff7a30fc8 in PyEval_EvalCodeEx () from /lib64/libpython3.5m.so.1.0 #20 0x00007ffff799958e in function_call () from /lib64/libpython3.5m.so.1.0 #21 0x00007ffff796e2aa in PyObject_Call () from /lib64/libpython3.5m.so.1.0 #22 0x00007ffff7a2be87 in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #23 0x00007ffff7a2ea88 in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #24 0x00007ffff7a2ea88 in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #25 0x00007ffff7a2ea88 in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #26 0x00007ffff7a2ea88 in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #27 0x00007ffff7a2ea88 in PyEval_EvalFrameEx () from /lib64/libpython3.5m.so.1.0 #28 0x00007ffff7a30ebc in _PyEval_EvalCodeWithName () from /lib64/libpython3.5m.so.1.0 #29 0x00007ffff7a30fc8 in PyEval_EvalCodeEx () from /lib64/libpython3.5m.so.1.0 #30 0x00007ffff7a3100b in PyEval_EvalCode () from /lib64/libpython3.5m.so.1.0 #31 0x00007ffff7a50434 in run_mod () from /lib64/libpython3.5m.so.1.0 #32 0x00007ffff7a528bd in PyRun_FileExFlags () from /lib64/libpython3.5m.so.1.0 #33 0x00007ffff7a52a27 in PyRun_SimpleFileExFlags () from /lib64/libpython3.5m.so.1.0 #34 0x00007ffff7a68bec in Py_Main () from /lib64/libpython3.5m.so.1.0 #35 0x0000000000400a29 in main () (gdb) |
|||
msg335449 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-02-13 15:25 | |
> Every now and then the python3.5 crashes, seg faults. Sorry but Python 3.5 no longer accepts bugfixes and has known bugs. http://devguide.python.org/#status-of-python-branches It's time to upgrade your Python! Are you able to reproduce the crash on Python 3.6 or 3.7? |
|||
msg335450 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-02-13 15:27 | |
> Program received signal SIGPIPE, Broken pipe. SIGPIPE is not a bug, it's a deliberate signal to notice something to your application. You have to ignore SIGPIPE in gdb: (gdb) handle SIGPIPE nostop noprint Signal Stop Print Pass to program Description SIGPIPE No No Yes Broken pipe |
|||
msg335451 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-02-13 15:28 | |
You can enable faulthandler to get a traceback of your Python theads when a crash occurs: https://pythondev.readthedocs.io/debug_tools.html#get-a-traceback-on-a-crash |
|||
msg335454 - (view) | Author: Hinko Kocevar (hinxx) | Date: 2019-02-13 15:41 | |
> Are you able to reproduce the crash on Python 3.6 or 3.7? I'm running it under python3.6 as we speak. |
|||
msg335456 - (view) | Author: Hinko Kocevar (hinxx) | Date: 2019-02-13 15:46 | |
Using python3.6 now. I've also used the 'handle SIGPIPE nostop noprint' to skip SIGPIPE. Now the interpreter is killed instead: INF 2019-02-13 15:42:19,131 web:2162 log_request - 200 GET /NPM.html (10.0.6.209) 1.10ms [Thread 0x7fffe76eb700 (LWP 8411) exited] [Thread 0x7fffe78ec700 (LWP 8410) exited] [Thread 0x7fffe7a2d700 (LWP 8409) exited] [Thread 0x7fffe7d41700 (LWP 8408) exited] [Thread 0x7fffe8542700 (LWP 8407) exited] Program terminated with signal SIGKILL, Killed. The program no longer exists. (gdb) bt No stack. (gdb) And the dmesg shows, what I think is the real reason, for python3.5/3.6 dying on me (out of memory): [26194250.313676] vmtoolsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 [26194250.313691] vmtoolsd cpuset=/ mems_allowed=0 [26194250.313696] CPU: 0 PID: 29876 Comm: vmtoolsd Not tainted 3.10.0-693.21.1.el7.x86_64 #1 [26194250.313697] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 [26194250.313698] Call Trace: [26194250.313714] [<ffffffff816ae7c8>] dump_stack+0x19/0x1b [26194250.313717] [<ffffffff816a9b90>] dump_header+0x90/0x229 [26194250.313725] [<ffffffff810ecec2>] ? ktime_get_ts64+0x52/0xf0 [26194250.313733] [<ffffffff8114140f>] ? delayacct_end+0x8f/0xb0 [26194250.313737] [<ffffffff8118a884>] oom_kill_process+0x254/0x3d0 [26194250.313739] [<ffffffff8118a32d>] ? oom_unkillable_task+0xcd/0x120 [26194250.313740] [<ffffffff8118a3d6>] ? find_lock_task_mm+0x56/0xc0 [26194250.313742] [<ffffffff8118b0c6>] out_of_memory+0x4b6/0x4f0 [26194250.313744] [<ffffffff816aa694>] __alloc_pages_slowpath+0x5d6/0x724 [26194250.313747] [<ffffffff811912a5>] __alloc_pages_nodemask+0x405/0x420 [26194250.313752] [<ffffffff811d5a38>] alloc_pages_current+0x98/0x110 [26194250.313757] [<ffffffff81186da7>] __page_cache_alloc+0x97/0xb0 [26194250.313759] [<ffffffff81189370>] filemap_fault+0x170/0x420 [26194250.313869] [<ffffffffc022b1e6>] ext4_filemap_fault+0x36/0x50 [ext4] [26194250.313875] [<ffffffff811b19d8>] __do_fault+0x58/0xe0 [26194250.313876] [<ffffffff811b1e7b>] do_read_fault.isra.44+0x4b/0x130 [26194250.313880] [<ffffffff811b6781>] handle_mm_fault+0x691/0xfa0 [26194250.313888] [<ffffffff810cb364>] ? cputime_adjust+0x94/0x150 [26194250.313893] [<ffffffff816bb504>] __do_page_fault+0x154/0x450 [26194250.313894] [<ffffffff816bb835>] do_page_fault+0x35/0x90 [26194250.313898] [<ffffffff816b7768>] page_fault+0x28/0x30 [26194250.313912] Mem-Info: [26194250.313915] active_anon:317608 inactive_anon:106541 isolated_anon:0 active_file:836 inactive_file:1338 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 slab_reclaimable:5892 slab_unreclaimable:7233 mapped:394 shmem:334 pagetables:5322 bounce:0 free:12861 free_pcp:12 free_cma:0 [26194250.313920] Node 0 DMA free:7624kB min:380kB low:472kB high:568kB active_anon:3680kB inactive_anon:3828kB active_file:4kB inactive_file:96kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:4kB slab_reclaimable:96kB slab_unreclaimable:52kB kernel_stack:32kB pagetables:160kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:492 all_unreclaimable? yes [26194250.313925] lowmem_reserve[]: 0 1822 1822 1822 [26194250.313927] Node 0 DMA32 free:43820kB min:44672kB low:55840kB high:67008kB active_anon:1266752kB inactive_anon:422336kB active_file:3340kB inactive_file:5256kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080704kB managed:1867756kB mlocked:0kB dirty:0kB writeback:0kB mapped:1572kB shmem:1332kB slab_reclaimable:23472kB slab_unreclaimable:28880kB kernel_stack:3200kB pagetables:21128kB unstable:0kB bounce:0kB free_pcp:48kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11345 all_unreclaimable? yes [26194250.313930] lowmem_reserve[]: 0 0 0 0 [26194250.313932] Node 0 DMA: 4*4kB (UEM) 9*8kB (UEM) 9*16kB (UEM) 3*32kB (E) 2*64kB (UE) 2*128kB (UE) 3*256kB (UM) 4*512kB (UEM) 4*1024kB (EM) 0*2048kB 0*4096kB = 7624kB [26194250.313939] Node 0 DMA32: 386*4kB (E) 482*8kB (UE) 324*16kB (UEM) 247*32kB (UEM) 157*64kB (UEM) 56*128kB (UEM) 18*256kB (UEM) 7*512kB (UEM) 0*1024kB 0*2048kB 0*4096kB = 43896kB [26194250.313945] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [26194250.313946] 3429 total pagecache pages [26194250.313947] 890 pages in swap cache [26194250.313950] Swap cache stats: add 52552391, delete 52551501, find 24707502/29312733 [26194250.313951] Free swap = 0kB [26194250.313951] Total swap = 2047996kB [26194250.313952] 524174 pages RAM [26194250.313953] 0 pages HighMem/MovableOnly [26194250.313953] 53258 pages reserved [26194250.313954] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [26194250.313963] [ 564] 0 564 9307 371 23 46 0 systemd-journal [26194250.313965] [ 751] 81 751 9244 104 21 89 -900 dbus-daemon [26194250.313967] [ 783] 0 783 138064 94 101 9333 0 NetworkManager [26194250.313968] [ 849] 0 849 28343 0 58 3123 0 dhclient [26194250.313970] [ 1016] 0 1016 8818 35 19 236 0 systemd-logind [26194250.313971] [ 1033] 0 1033 27511 0 10 29 0 agetty [26194250.313973] [ 1063] 0 1063 58637 1 70 2842 0 salt-minion [26194250.313974] [ 1064] 0 1064 7674 0 20 83 0 oddjobd [26194250.313976] [ 1855] 0 1855 188662 798 115 5821 0 salt-minion [26194250.313977] [ 6990] 0 6990 5418 39 16 42 0 irqbalance [26194250.313979] [ 7014] 38 7014 9491 36 19 117 0 ntpd [26194250.313980] [ 1093] 1768605311 1093 9175 10 19 75 0 caRepeater [26194250.313982] [ 2463] 0 2463 28203 0 57 257 -1000 sshd [26194250.313984] [23726] 0 23726 11089 1 25 112 -1000 systemd-udevd [26194250.313985] [29632] 0 29632 90496 237 97 6261 0 firewalld [26194250.313987] [29874] 0 29874 24891 0 42 403 0 VGAuthService [26194250.313988] [29876] 0 29876 56747 91 56 246 0 vmtoolsd [26194250.313990] [29942] 0 29942 31206 0 27 90 0 lvmetad [26194250.313991] [ 7357] 0 7357 22907 5 45 255 0 master [26194250.313993] [ 7359] 89 7359 25518 0 48 266 0 qmgr [26194250.313994] [ 7387] 0 7387 15511 17 30 139 -1000 auditd [26194250.313995] [ 7432] 0 7432 31572 19 18 137 0 crond [26194250.313997] [ 7455] 0 7455 109816 325 91 447 0 rsyslogd [26194250.313998] [ 7498] 996 7498 300955 41 535 265234 0 redis-server [26194250.314000] [ 7559] 999 7559 156319 94 63 981 0 polkitd [26194250.314002] [ 7577] 0 7577 143454 101 100 2695 0 tuned [26194250.314003] [ 7822] 0 7822 67726 10 80 248 0 sssd [26194250.314004] [ 7823] 0 7823 134088 951 183 4925 0 sssd_be [26194250.314006] [ 7824] 0 7824 73958 195 92 171 0 sssd_nss [26194250.314007] [ 7825] 0 7825 64757 216 78 167 0 sssd_pam [26194250.314009] [14104] 0 14104 110005 41 158 543 0 smbd [26194250.314010] [14106] 0 14106 103821 41 146 515 0 smbd-notifyd [26194250.314012] [14107] 0 14107 103940 47 148 509 0 cleanupd [26194250.314013] [14108] 0 14108 110001 33 148 549 0 lpqd [26194250.314014] [14117] 0 14117 87243 47 118 401 0 nmbd [26194250.314016] [23233] 0 23233 48857 180 49 4827 0 miniserv.pl [26194250.314017] [13608] 1768605311 13608 34728 230 18 105 0 screen [26194250.314019] [13609] 1768605311 13609 31772 235 18 201 0 bash [26194250.314020] [ 7203] 0 7203 110827 69 159 546 0 smbd [26194250.314021] [ 7293] 0 7293 48031 1 87 355 0 sshd [26194250.314023] [ 7296] 1768605311 7296 48417 41 86 723 0 sshd [26194250.314024] [ 7297] 1768605311 7297 31730 1 17 391 0 bash [26194250.314025] [ 7399] 0 7399 118856 85 159 944 0 smbd [26194250.314027] [ 7467] 1768605311 7467 34483 8 22 68 0 screen [26194250.314028] [ 8047] 0 8047 110262 185 155 449 0 smbd [26194250.314030] [ 8085] 0 8085 48031 5 89 351 0 sshd [26194250.314031] [ 8087] 1768605311 8087 48031 28 85 335 0 sshd [26194250.314032] [ 8088] 1768605311 8088 31767 245 17 191 0 bash [26194250.314034] [ 8367] 1768605311 8367 40484 0 32 713 0 vim [26194250.314035] [ 8368] 89 8368 25474 0 45 253 0 pickup [26194250.314037] [ 8392] 0 8392 70218 0 82 315 0 sudo [26194250.314038] [ 8395] 0 8395 28294 0 13 49 0 bash [26194250.314039] [ 8397] 0 8397 55725 1 63 12295 0 gdb [26194250.314041] [ 8399] 0 8399 713552 418045 1224 162923 0 python3.6 [26194250.314042] Out of memory: Kill process 8399 (python3.6) score 574 or sacrifice child [26194250.315180] Killed process 8399 (python3.6) total-vm:2854208kB, anon-rss:1672180kB, file-rss:0kB, shmem-rss:0kB |
|||
msg335457 - (view) | Author: STINNER Victor (vstinner) * | Date: 2019-02-13 15:47 | |
> [26194250.314042] Out of memory: Kill process 8399 (python3.6) score 574 or sacrifice child That's not a bug in Python: your application is using more memory than what is available in the system. |
|||
msg335471 - (view) | Author: Josh Rosenberg (josh.r) * | Date: 2019-02-13 18:07 | |
"your application is using more memory than what is available in the system." Well, it alone may not be using more memory, but the cumulative usage on the system is "too high" by whatever metric the OOM killer is using (IIRC the default rule is that actual committed memory must be less than swap size + 50% of RAM). The OOM killer is a strange and terrible beast, and the behavior varies based on configuration, relative memory usage of each process grouping, minimizing number of processes killed, etc. You can find deep implementation details on it (including how to disable a given process for consideration) here: https://linux-mm.org/OOM_Killer The real solution to problems like this usually amounts to: 1. Install more RAM. 2. Increase the size of your swap partition. Doesn't "fix" being shy of memory if you're actually using more memory than you have RAM, but allows you to handle overcommit (particularly for fork+exec scenarios where the forked process's memory will be freed momentarily) without the OOM killer getting involved, and to limp along slowly, without actually failing, if you actually allocate and use more memory than you have RAM. 3. Tweak the overcommit heuristics to allow more overcommit before invoking the OOM killer. 4. Disable overcommit entirely, so memory allocations fail immediately if sufficient backing storage is not available, rather than succeeding, only to invoke the OOM killer when the allocated memory gets used and the shortage is discovered. This is a good solution if the program(s) in question aren't poorly designed such that they try to allocate many GBs of memory up front even when they're unlikely to need it; unfortunately, there are commonly used programs that overallocate like this and render this solution non-viable if they're part of your setup. Regardless, this isn't a bug in Python itself. Any process that uses a lot of memory (Apache, MySQL) and hasn't explicitly removed itself from OOM killer consideration is going to look tasty when an OOM scenario occurs. |
|||
msg335514 - (view) | Author: Hinko Kocevar (hinxx) | Date: 2019-02-14 09:57 | |
Thank you for the valuable information about the OOM beast.. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:11 | admin | set | github: 80169 |
2019-02-14 09:57:04 | hinxx | set | messages: + msg335514 |
2019-02-13 18:07:14 | josh.r | set | status: open -> closed nosy: + josh.r messages: + msg335471 resolution: not a bug stage: resolved |
2019-02-13 15:47:49 | vstinner | set | messages: + msg335457 |
2019-02-13 15:46:45 | hinxx | set | messages: + msg335456 |
2019-02-13 15:41:48 | hinxx | set | messages: + msg335454 |
2019-02-13 15:28:21 | vstinner | set | messages: + msg335451 |
2019-02-13 15:27:25 | vstinner | set | messages: + msg335450 |
2019-02-13 15:25:29 | vstinner | set | nosy:
+ vstinner messages: + msg335449 |
2019-02-13 14:46:09 | hinxx | create |