Issue23187
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015-01-07 23:41 by karamanolev, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
Screenshot 2015-01-07 15.57.01.png | karamanolev, 2015-01-08 00:03 | Munin memory usage screenshot | ||
gc_next.png | michaelg, 2015-03-14 02:27 | |||
gc_refs.png | michaelg, 2015-03-15 10:13 |
Messages (28) | |||
---|---|---|---|
msg233608 - (view) | Author: Ivailo Karamanolev (karamanolev) | Date: 2015-01-07 23:41 | |
I have a program that's running Tornado on top of asyncio and uses mysql-connector-python to monitor a set of HTTP servers and save the status to MySQL. Initially I ran it with the current version of Python in the Ubuntu 14.04 repos, that's 3.4.0 and started getting segmentation faults after keeping it running for a few days. I compiled and installed 3.4.2 and the same thing keeps happening, with the same stack trace (per gdb). Running it with and without gdb attached gives the same results - seg fault after 3-4 days of running. The machine is running on a Xeon with ECC memory and doesn't show signs of illness otherwise, so it's probably not a hardware problem. Also, it's happening consistently when the programs runs for 3-4 days, with or without a system reboot. Program received signal SIGSEGV, Segmentation fault. 0x0000000000471ed7 in subtype_dealloc (self=0x7ffff16a3860) at Objects/typeobject.c:1128 1128 _PyObject_GC_UNTRACK(self); Stack trace: #0 0x0000000000471ed7 in subtype_dealloc (self=0x7ffff16a3860) at Objects/typeobject.c:1128 #1 0x00000000004563f4 in dict_dealloc (mp=0x7fffe95444c8) at Objects/dictobject.c:1383 #2 0x0000000000472140 in subtype_dealloc (self=0x7fffe8e1fcf8) at Objects/typeobject.c:1157 #3 0x000000000058ad22 in frame_dealloc (f=0x1145258) at Objects/frameobject.c:429 #4 0x0000000000588fd7 in gen_send_ex (gen=gen@entry=0x7fffe8c5cea0, gen=gen@entry=0x7fffe8c5cea0, exc=0, arg=arg@entry=0x0) at Objects/genobject.c:145 #5 _PyGen_Send (gen=gen@entry=0x7fffe8c5cea0, arg=arg@entry=0x87fec0 <_Py_FalseStruct>) at Objects/genobject.c:158 #6 0x00000000004da329 in PyEval_EvalFrameEx (f=f@entry=0x11faaf8, throwflag=throwflag@entry=0) at Python/ceval.c:1906 #7 0x0000000000588e5c in gen_send_ex (gen=gen@entry=0x7fffe8c5ccf0, gen=gen@entry=0x7fffe8c5ccf0, exc=0, arg=arg@entry=0x7ffff1bb6420) at Objects/genobject.c:104 #8 _PyGen_Send (gen=gen@entry=0x7fffe8c5ccf0, arg=arg@entry=0x87fec0 <_Py_FalseStruct>) at Objects/genobject.c:158 #9 0x00000000004da329 in PyEval_EvalFrameEx (f=f@entry=0x10daa48, throwflag=throwflag@entry=0) at Python/ceval.c:1906 #10 0x0000000000588e5c in gen_send_ex (gen=gen@entry=0x7fffe8c5cc18, gen=gen@entry=0x7fffe8c5cc18, exc=0, arg=arg@entry=0x102aac8) at Objects/genobject.c:104 #11 _PyGen_Send (gen=gen@entry=0x7fffe8c5cc18, arg=arg@entry=0x87fec0 <_Py_FalseStruct>) at Objects/genobject.c:158 #12 0x00000000004da329 in PyEval_EvalFrameEx (f=f@entry=0x10da7f8, throwflag=throwflag@entry=0) at Python/ceval.c:1906 #13 0x0000000000588e5c in gen_send_ex (gen=0x7ffff16b65a0, gen=0x7ffff16b65a0, exc=0, arg=<optimized out>) at Objects/genobject.c:104 #14 _PyGen_Send (gen=0x7ffff16b65a0, arg=<optimized out>) at Objects/genobject.c:158 #15 0x00000000004d9d0b in call_function (oparg=<optimized out>, pp_stack=0x7fffffffd290) at Python/ceval.c:4221 #16 PyEval_EvalFrameEx (f=f@entry=0x10da568, throwflag=throwflag@entry=0) at Python/ceval.c:2836 #17 0x00000000004d28d4 in PyEval_EvalCodeEx (_co=0x7ffff1e0c030, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=3, kws=0x12e38b8, kwcount=0, defs=0x7ffff1e0a8e0, defcount=2, kwdefs=0x0, closure=0x7ffff1e10588) at Python/ceval.c:3585 #18 0x00000000004d8e75 in fast_function (nk=<optimized out>, na=3, n=3, pp_stack=0x7fffffffd490, func=0x7ffff1e122f0) at Python/ceval.c:4341 #19 call_function (oparg=<optimized out>, pp_stack=0x7fffffffd490) at Python/ceval.c:4259 #20 PyEval_EvalFrameEx (f=f@entry=0x12e3708, throwflag=throwflag@entry=0) at Python/ceval.c:2836 #21 0x00000000004d28d4 in PyEval_EvalCodeEx (_co=0x7ffff1e0c0c0, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x7fffeae8a720, argcount=2, kws=kws@entry=0x0, kwcount=kwcount@entry=0, defs=defs@entry=0x0, defcount=defcount@entry=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:3585 #22 0x000000000058c67c in function_call (func=0x7ffff1e12378, arg=0x7fffeae8a708, kw=0x0) at Objects/funcobject.c:632 #23 0x000000000042686a in PyObject_Call (func=func@entry=0x7ffff1e12378, arg=arg@entry=0x7fffeae8a708, kw=kw@entry=0x0) at Objects/abstract.c:2067 #24 0x00000000004d48a4 in ext_do_call (nk=<optimized out>, na=1, flags=<optimized out>, pp_stack=0x7fffffffd760, func=0x7ffff1e12378) at Python/ceval.c:4558 #25 PyEval_EvalFrameEx (f=f@entry=0x145ac48, throwflag=throwflag@entry=0) at Python/ceval.c:2876 #26 0x00000000004d9e82 in fast_function (nk=<optimized out>, na=1, n=1, pp_stack=0x7fffffffd8a0, func=0x7ffff1e2d488) at Python/ceval.c:4331 #27 call_function (oparg=<optimized out>, pp_stack=0x7fffffffd8a0) at Python/ceval.c:4259 #28 PyEval_EvalFrameEx (f=f@entry=0x11aee28, throwflag=throwflag@entry=0) at Python/ceval.c:2836 #29 0x00000000004d9e82 in fast_function (nk=<optimized out>, na=1, n=1, pp_stack=0x7fffffffd9f0, func=0x7ffff1bc1400) at Python/ceval.c:4331 #30 call_function (oparg=<optimized out>, pp_stack=0x7fffffffd9f0) at Python/ceval.c:4259 #31 PyEval_EvalFrameEx (f=f@entry=0x10d9c68, throwflag=throwflag@entry=0) at Python/ceval.c:2836 #32 0x00000000004d9e82 in fast_function (nk=<optimized out>, na=1, n=1, pp_stack=0x7fffffffdb40, func=0x7ffff1bc0268) at Python/ceval.c:4331 #33 call_function (oparg=<optimized out>, pp_stack=0x7fffffffdb40) at Python/ceval.c:4259 #34 PyEval_EvalFrameEx (f=f@entry=0xf9ed08, throwflag=throwflag@entry=0) at Python/ceval.c:2836 #35 0x00000000004d9e82 in fast_function (nk=<optimized out>, na=0, n=0, pp_stack=0x7fffffffdc90, func=0x7ffff16a4d08) at Python/ceval.c:4331 #36 call_function (oparg=<optimized out>, pp_stack=0x7fffffffdc90) at Python/ceval.c:4259 #37 PyEval_EvalFrameEx (f=f@entry=0x7ffff7f4b438, throwflag=throwflag@entry=0) at Python/ceval.c:2836 #38 0x00000000004dae2a in PyEval_EvalCodeEx (closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0, locals=locals@entry=0x7ffff7f95d50, globals=globals@entry=0x100000000, _co=0x7ffff7ee0150, _co@entry=0x8ac460) at Python/ceval.c:3585 #39 PyEval_EvalCode (co=co@entry=0x7ffff7ee0150, globals=globals@entry=0x7ffff7f48408, locals=locals@entry=0x7ffff7f48408) at Python/ceval.c:773 #40 0x000000000050a8f8 in run_mod (arena=0x95d9a0, flags=0x7fffffffdee0, locals=0x7ffff7f48408, globals=0x7ffff7f48408, filename=0x7ffff7e3f540, mod=0x8fb1e0) at Python/pythonrun.c:2180 #41 PyRun_FileExFlags (fp=0x8fa9c0, filename_str=<optimized out>, start=<optimized out>, globals=0x7ffff7f48408, locals=0x7ffff7f48408, closeit=1, flags=0x7fffffffdee0) at Python/pythonrun.c:2133 #42 0x000000000050c8f5 in PyRun_SimpleFileExFlags (fp=0x8fa9c0, filename=<optimized out>, closeit=1, flags=0x7fffffffdee0) at Python/pythonrun.c:1606 #43 0x000000000041edc4 in run_file (p_cf=0x7fffffffdee0, filename=0x8ac2a0 L"./torrent_manager.py", fp=0x8fa9c0) at Modules/main.c:319 #44 Py_Main (argc=argc@entry=2, argv=argv@entry=0x8ab010) at Modules/main.c:751 #45 0x000000000041b227 in main (argc=2, argv=<optimized out>) at ./Modules/python.c:69 |
|||
msg233609 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2015-01-07 23:48 | |
That's going to be really hard to debug. Is it always the same stack trace? Could it be slowly running out of memory? (The consistent time until it happens points to this.) A segfault in dealloc might point to a refcounting bug. The rarity of the even might mean that it only happens in some rare case (maybe the refcount bug is in an error path that is only triggered by running out of memory somewhere?). Is mysql totally unsuspected? |
|||
msg233610 - (view) | Author: Ivailo Karamanolev (karamanolev) | Date: 2015-01-08 00:03 | |
Running out of memory is unlikely. There are many other processes running on the machine and none of them show signs of memory issues. In addition, the munin memory usage reporting doesn't show signs of steady growing or drops when the process crashes (munin screenshot attached). 3.4.0 and 3.4.2 on all runs showed exactly the same stack trace. AFAIK mysql-connector-python is pure Python, so even if the problem was there, I think it's still Python core. I'm not 100% on this though. |
|||
msg233611 - (view) | Author: Ivailo Karamanolev (karamanolev) | Date: 2015-01-08 00:08 | |
What I forgot to mention is that I'm using aiohttp for making lots of HTTP requests. http://bugs.python.org/issue21435 mentions asyncio and aiohttp and is somewhat memory management related so I thought I'm hitting a different manifestation of it and upgrading to 3.4.2 would fix it, but it didn't. I don't know enough about cpython to know if they are related. |
|||
msg233612 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-01-08 00:10 | |
> that's 3.4.0 I suggest you to upgrade to Python 3.4.2. Your issue may already be fixed. For example, there was the issue #21435 in the garbage collector. The issue was found with an asyncio code. Your crash occurs at "_PyObject_GC_UNTRACK(self);" which is also related to the garbage collector. |
|||
msg233613 - (view) | Author: Ivailo Karamanolev (karamanolev) | Date: 2015-01-08 00:13 | |
@haypo Maybe I didn't put it prominently enough, but I upgraded to 3.4.2 by downloading the source and installing it. The same crash with the same stack trace happens. |
|||
msg233614 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-01-08 00:18 | |
> upgrading to 3.4.2 would fix it, but it didn't Oh sorry, I missed the part where you wrote that you already tested Python 3.4.2. By the way, if you already compile your own Python: try to configure Python with ./configure --with-pydebug. It adds much more debug checks to check internal consistency. > Is mysql totally unsuspected? It looks like mysql-connector-python is written in pure Python. Are you using other external modules which are implemented in C? -- Would it be possible to share your source code to try to reproduce the issue? You might send it to me privately if you prefer. |
|||
msg233615 - (view) | Author: Ivailo Karamanolev (karamanolev) | Date: 2015-01-08 00:29 | |
> try to configure Python with ./configure --with-pydebug I'm installing 3.4.2 with this option right now and I'll reboot and run the script with gdb attached. I might be a few days before I have a result. > Are you using other external modules which are implemented in C The only module that I'm using that's implemented in C is ujson https://pypi.python.org/pypi/ujson . If you think this is relevant, I can easily try changing it to use the built-in json. > Would it be possible to share your source code I can share it privately, yes, but running it on your machine might be too complicated. Sharing the source code will probably be only useful for you to look at what libraries I am using and how I'm using asyncio. What I can do is try running it on another machine with another setup. While it is currently running on a physical machine that I set up, I can run it in parallel on a OpenVZ VPS that is installed by the VPS provider, so it should give a completely independent environment. Would that help? If yes, should I also install 3.4.2 from source with --with-pydebug? |
|||
msg233616 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-01-08 00:49 | |
> The only module that I'm using that's implemented in C is ujson https://pypi.python.org/pypi/ujson . If you think this is relevant, I can easily try changing it to use the built-in json. To investigate an issue, we have to reduce the scope. If your program works with json, please use it instead yes. If the program still crash with the builtin json, it means that usjon is not the problem. If the program doesn't crash anymore, your issue may come from ujson and the investigate would be completly different... |
|||
msg233987 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-01-14 01:20 | |
@Ivailo: Any progress on your investigation? |
|||
msg233989 - (view) | Author: Ivailo Karamanolev (karamanolev) | Date: 2015-01-14 01:25 | |
@Victor It has been running for 5 days now on 3.4.2 using json instead of ujson. I want to give at least 10 days to crash and if it's still up, I plan to switch to simplejson to see how that will go, since json is quite slow. I'll keep playing with using different modules and when/if I have something significant that I can reproduce - I'll post here. If the problems are resolved, I'll close the ticket. If that doesn't sound reasonable, please let me know. |
|||
msg234747 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-01-26 14:19 | |
Since the issue looks to be related to a third party module (ujson), I close this issue. Note: On UNIX, asyncio is fully implemented in Python, I don't see how it could corrupt internal Python structures. |
|||
msg237742 - (view) | Author: Michael Goldish (michaelg) | Date: 2015-03-10 10:44 | |
I'm not sure where to post this as the issue is closed: I've encountered the same problem. I don't have a stack trace to share at the moment but I know it's almost (or entirely) identical. My context: - Windows - Happens with Python 3.4.2 and 3.4.3 (and possibly earlier versions) - Pure Python (no external binary modules whatsoever) - Pure asyncio (no Tornado) - Not using json/ujson - Plenty of memory available The crash seems to occur at random, with anything between 1 and 20 days separating consecutive crashes. I have core dumps if you're interested but am not sure where to upload them. |
|||
msg237745 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-03-10 11:05 | |
> I'm not sure where to post this as the issue is closed: Please open a new issue. > I've encountered the same problem. Why do you consider that it's the same issue? > The crash seems to occur at random, with anything between 1 and 20 days separating consecutive crashes. Outch. I hate such random bugs. > I have core dumps if you're interested but am not sure where to upload them. I don't know how to analyze a Windows core dump. Sorry, but without more information, it will be almost impossible to fix your issue. You can try to use faulthandler to write the traceback of all Python threads where the crash occurred: https://docs.python.org/dev/library/faulthandler.html#faulthandler.enable Write the ouput into a file. The output is text. Brian Curtin wrote a similar tool but more specific to Windows: https://bitbucket.org/briancurtin/minidumper/ "minidumper writes "minidumps" for post-mortem analysis of crashes in Python or its extensions." Problem: I don't know how to use such dump!? I guess that Visual Studio can load such minidump files. https://msdn.microsoft.com/en-us/library/windows/desktop/ee416349%28v=vs.85%29.aspx#analyzing_a_minidump Did you see some errors or warnings when running your application with asyncio in debug mode? https://docs.python.org/dev/library/asyncio-dev.html Please use at least Python 3.4.3 which embeds a recent version of asyncio, it enables more checks in debug mode. Can you maybe send me the source code of your application? |
|||
msg238061 - (view) | Author: Michael Goldish (michaelg) | Date: 2015-03-13 22:40 | |
> Why do you consider that it's the same issue? Because it's a very similar stack trace, the crash itself is in subtype_dealloc(), it happens once every few days, asyncio is involved, and the issue is relatively new - I couldn't find another bug report with a similar description. > I have core dumps if you're interested but am not sure where to upload them. > You can try to use faulthandler to write the traceback of all Python threads where the crash occurred I've enabled faulthandler, the program crashed, and nothing was written to the file. For reference, I caused an artificial crash in the same Python program (using ctypes) and the stack traces *were* written. I verified this several times, both with the real crash and the artificial one, and the real crash always left the file empty. > I guess that Visual Studio can load such minidump files. I've already looked at one of my dumps in Visual Studio but couldn't make sense of what was going on. The structures involved seemed to contain data that made no sense to me (e.g. some pointers had values like 0x10, which usually makes no sense for a pointer and seems to indicate memory corruption). > Did you see some errors or warnings when running your application with asyncio in debug mode? No, but I'll try. I doubt the problem is in asyncio itself because it's mostly written in Python. This looks like a garbage collection issue. > Can you maybe send me the source code of your application? I'll ask to see if that's permissible, but either way you will probably have a hard time reproducing the crash - the program is a server which maintains active TCP connections to more than 1000 client instances, writes large text files in background threads, pickles large objects... The frequency of the crash seems to be correlated to how busy the server is. Without a proper setup it'll be difficult to make it busy. |
|||
msg238065 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2015-03-14 00:48 | |
Are we all in agreement that it's probably somehow running out of memory? |
|||
msg238066 - (view) | Author: Michael Goldish (michaelg) | Date: 2015-03-14 01:41 | |
I don't see a reason to assume the machine was running out of memory. After each crash the process was kept alive by one of those Windows crash dialogs - "the process terminated unexpectedly" or similar. I could see exactly how much memory was occupied by the process and it was never a significant percentage of the machine's total memory. It sometimes exceeded 2GB but I'm using a 64 bit build so this shouldn't be a problem. |
|||
msg238069 - (view) | Author: Michael Goldish (michaelg) | Date: 2015-03-14 02:27 | |
I caught another crash just now, this time in update_refs(). A stack trace is attached below. I still think this is the same issue. static void update_refs(PyGC_Head *containers) { PyGC_Head *gc = containers->gc.gc_next; 0000000067382D60 mov rdx,qword ptr [rcx] for (; gc != containers; gc = gc->gc.gc_next) { 0000000067382D63 cmp rdx,rcx 0000000067382D66 je update_refs+28h (67382D88h) 0000000067382D68 nop dword ptr [rax+rax] assert(_PyGCHead_REFS(gc) == GC_REACHABLE); _PyGCHead_SET_REFS(gc, Py_REFCNT(FROM_GC(gc))); 0000000067382D70 and qword ptr [rdx+10h],1 Here rdx is 0, which means we're writing to 0x10, probably because the member gc_refs is at offset 0x10 of struct PyGC_Head.gc. So I suppose containers->gc.gc_next was NULL. (In case this is helpful: somehow, possibly due to compiler optimizations, Visual Studio claims that containers->gc.gc_next is 0x34. I'm not sure what to make of this. It also claims that containers->gc.gc_prev->gc.gc_next is 0x3e, and that containers->gc.gc_prev->gc.gc_prev->gc.gc_next is 0x3e, and so on... gc_prev always seems fine and gc_next is always 0x3e, except for the first one which is 0x34. I'm attaching a screenshot to make this clearer.) Stack trace (Python 3.4.2, 64 bit, Windows): python34.dll!update_refs(_gc_head * containers=0x00000000676af8e0) Line 345 python34.dll!collect(int generation=-290088656, __int64 * n_collected=0x00000000f166e920, __int64 * n_uncollectable=0x0000000000000000, int nofail=0) Line 969 python34.dll!collect_with_callback(int generation=-290088656) Line 1141 python34.dll!_PyObject_GC_Malloc(unsigned __int64 basicsize=4046951880) Line 1739 python34.dll!_PyObject_GC_New(_typeobject * tp=0x0000000001c624f0) Line 1749 python34.dll!PyList_New(__int64 size=0) Line 159 + 0xc bytes python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000bab65b08, int throwflag=-244913096) Line 2346 python34.dll!fast_function(_object * func=0x0000000000000003, _object * * * pp_stack=0x00000000f77684e0, int n=102445400, int na=1732453353, int nk=0) Line 4332 python34.dll!call_function(_object * * * pp_stack=0x00000000f166ec29, int oparg=131) Line 4260 python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000601cbd68, int throwflag=-244912600) Line 2838 python34.dll!fast_function(_object * func=0x0000000000000003, _object * * * pp_stack=0x00000000f7768f28, int n=56017240, int na=1732453353, int nk=0) Line 4332 python34.dll!call_function(_object * * * pp_stack=0x00000000f166ee19, int oparg=131) Line 4260 python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000c65ff8f8, int throwflag=0) Line 2838 python34.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000002, _object * globals=0x0000000000000002, _object * locals=0x0000000000000000, _object * * args=0x000000000358d248, int argcount=2, _object * * kws=0x0000000001c50060, int kwcount=0, _object * * defs=0x0000000000000000, int defcount=0, _object * kwdefs=0x0000000000000000, _object * closure=0x0000000000000000) Line 3585 + 0xa bytes python34.dll!function_call(_object * func=0x000000000355f048, _object * arg=0x00000000f12f7688, _object * kw=0x0000000000000000) Line 638 + 0x45 bytes python34.dll!PyObject_Call(_object * func=0x00000000f12f7688, _object * arg=0x00000000f11c4d08, _object * kw=0x00000000f4e9ba58) Line 2068 python34.dll!ext_do_call(_object * func=0x000000000355f048, _object * * * pp_stack=0x00000000f166f0d9, int flags=-200649216, int na=1, int nk=0) Line 4558 + 0xe bytes python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000f40a5458, int throwflag=-244911400) Line 2879 python34.dll!fast_function(_object * func=0x0000000000000001, _object * * * pp_stack=0x00000000f1380f98, int n=45993224, int na=1732453353, int nk=0) Line 4332 python34.dll!call_function(_object * * * pp_stack=0x00000000f166f2c9, int oparg=131) Line 4260 python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000d6922548, int throwflag=0) Line 2838 python34.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000002, _object * globals=0x0000000000000002, _object * locals=0x0000000000000000, _object * * args=0x0000000002334200, int argcount=2, _object * * kws=0x0000000001c50060, int kwcount=0, _object * * defs=0x0000000000000000, int defcount=0, _object * kwdefs=0x0000000000000000, _object * closure=0x0000000000000000) Line 3585 + 0xa bytes python34.dll!function_call(_object * func=0x0000000002bdcbf8, _object * arg=0x00000000e41ef808, _object * kw=0x0000000000000000) Line 638 + 0x45 bytes python34.dll!PyObject_Call(_object * func=0x00000000e41ef808, _object * arg=0x00000000ef4ad308, _object * kw=0x00000000deda7148) Line 2068 python34.dll!ext_do_call(_object * func=0x0000000002bdcbf8, _object * * * pp_stack=0x00000000f166f589, int flags=-280305184, int na=0, int nk=0) Line 4558 + 0xe bytes python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000ef4ae048, int throwflag=-244910200) Line 2879 python34.dll!fast_function(_object * func=0x0000000000000001, _object * * * pp_stack=0x00000000ef4a7c50, int n=44825728, int na=1732453353, int nk=0) Line 4332 python34.dll!call_function(_object * * * pp_stack=0x00000000f166f779, int oparg=131) Line 4260 python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000db168948, int throwflag=-244909704) Line 2838 python34.dll!fast_function(_object * func=0x0000000000000001, _object * * * pp_stack=0x00000000ef4a7c50, int n=44826272, int na=1732453353, int nk=0) Line 4332 python34.dll!call_function(_object * * * pp_stack=0x00000000f166f969, int oparg=131) Line 4260 python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000ef4ab418, int throwflag=0) Line 2838 python34.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000001, _object * globals=0x0000000000000001, _object * locals=0x0000000000000000, _object * * args=0x0000000000000000, int argcount=1, _object * * kws=0x0000000000000000, int kwcount=0, _object * * defs=0x0000000000000000, int defcount=0, _object * kwdefs=0x0000000000000000, _object * closure=0x0000000000000000) Line 3585 + 0xa bytes python34.dll!function_call(_object * func=0x0000000002abfd08, _object * arg=0x00000000e22ebef0, _object * kw=0x0000000000000000) Line 638 + 0x45 bytes python34.dll!PyObject_Call(_object * func=0x00000000e22ebef0, _object * arg=0x0000000000000000, _object * kw=0x0000000001c50048) Line 2068 python34.dll!method_call(_object * func=0x0000000002d36148, _object * arg=0x0000000001c50048, _object * kw=0x0000000000000000) Line 348 python34.dll!PyObject_Call(_object * func=0x0000000001c50048, _object * arg=0x0000000000000000, _object * kw=0x0000000002d36148) Line 2068 python34.dll!PyEval_CallObjectWithKeywords(_object * func=0x00000000c0a69f40, _object * arg=0x00000000673b1100, _object * kw=0x0000000000000000) Line 4112 python34.dll!t_bootstrap(void * boot_raw=0x00000000dacc7d00) Line 1000 + 0x17 bytes python34.dll!bootstrap(void * call=0x00000000dacc7d00) Line 177 msvcr100.dll!_callthreadstartex() Line 314 + 0xd bytes msvcr100.dll!_threadstartex(void * ptd=0x0000000000000000) Line 292 + 0x5 bytes kernel32.dll!0000000076eb5a4d() [Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll] ntdll.dll!00000000775aba01() |
|||
msg238072 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2015-03-14 04:42 | |
So what do you make of this? PyObject_GC_Malloc(unsigned __int64 basicsize=4046951880) That's nearly 4 GB. I somehow doubt your app is actually trying to allocate that much memory -- maybe the type object a few lines below in the stack is overwritten, or maybe the compiler is lying about everything. Is there any chance you can run an unoptimized build? On Fri, Mar 13, 2015 at 7:27 PM, Michael Goldish <report@bugs.python.org> wrote: > > Michael Goldish added the comment: > > I caught another crash just now, this time in update_refs(). A stack trace > is attached below. I still think this is the same issue. > > static void > update_refs(PyGC_Head *containers) > { > PyGC_Head *gc = containers->gc.gc_next; > 0000000067382D60 mov rdx,qword ptr [rcx] > for (; gc != containers; gc = gc->gc.gc_next) { > 0000000067382D63 cmp rdx,rcx > 0000000067382D66 je update_refs+28h (67382D88h) > 0000000067382D68 nop dword ptr [rax+rax] > assert(_PyGCHead_REFS(gc) == GC_REACHABLE); > _PyGCHead_SET_REFS(gc, Py_REFCNT(FROM_GC(gc))); > 0000000067382D70 and qword ptr [rdx+10h],1 > > Here rdx is 0, which means we're writing to 0x10, probably because the > member gc_refs is at offset 0x10 of struct PyGC_Head.gc. So I suppose > containers->gc.gc_next was NULL. > > (In case this is helpful: somehow, possibly due to compiler optimizations, > Visual Studio claims that containers->gc.gc_next is 0x34. I'm not sure what > to make of this. It also claims that containers->gc.gc_prev->gc.gc_next is > 0x3e, and that containers->gc.gc_prev->gc.gc_prev->gc.gc_next is 0x3e, and > so on... gc_prev always seems fine and gc_next is always 0x3e, except for > the first one which is 0x34. I'm attaching a screenshot to make this > clearer.) > > Stack trace (Python 3.4.2, 64 bit, Windows): > > python34.dll!update_refs(_gc_head * containers=0x00000000676af8e0) Line > 345 > python34.dll!collect(int generation=-290088656, __int64 * > n_collected=0x00000000f166e920, __int64 * > n_uncollectable=0x0000000000000000, int nofail=0) Line 969 > python34.dll!collect_with_callback(int generation=-290088656) Line 1141 > python34.dll!_PyObject_GC_Malloc(unsigned __int64 basicsize=4046951880) > Line 1739 > python34.dll!_PyObject_GC_New(_typeobject * tp=0x0000000001c624f0) Line > 1749 > python34.dll!PyList_New(__int64 size=0) Line 159 + 0xc bytes > python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000bab65b08, int > throwflag=-244913096) Line 2346 > python34.dll!fast_function(_object * func=0x0000000000000003, _object * * > * pp_stack=0x00000000f77684e0, int n=102445400, int na=1732453353, int > nk=0) Line 4332 > python34.dll!call_function(_object * * * pp_stack=0x00000000f166ec29, int > oparg=131) Line 4260 > python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000601cbd68, int > throwflag=-244912600) Line 2838 > python34.dll!fast_function(_object * func=0x0000000000000003, _object * * > * pp_stack=0x00000000f7768f28, int n=56017240, int na=1732453353, int > nk=0) Line 4332 > python34.dll!call_function(_object * * * pp_stack=0x00000000f166ee19, int > oparg=131) Line 4260 > python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000c65ff8f8, int > throwflag=0) Line 2838 > python34.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000002, _object * > globals=0x0000000000000002, _object * locals=0x0000000000000000, _object * > * args=0x000000000358d248, int argcount=2, _object * * > kws=0x0000000001c50060, int kwcount=0, _object * * defs=0x0000000000000000, > int defcount=0, _object * kwdefs=0x0000000000000000, _object * > closure=0x0000000000000000) Line 3585 + 0xa bytes > python34.dll!function_call(_object * func=0x000000000355f048, _object * > arg=0x00000000f12f7688, _object * kw=0x0000000000000000) Line 638 + 0x45 > bytes > python34.dll!PyObject_Call(_object * func=0x00000000f12f7688, _object * > arg=0x00000000f11c4d08, _object * kw=0x00000000f4e9ba58) Line 2068 > python34.dll!ext_do_call(_object * func=0x000000000355f048, _object * * * > pp_stack=0x00000000f166f0d9, int flags=-200649216, int na=1, int nk=0) > Line 4558 + 0xe bytes > python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000f40a5458, int > throwflag=-244911400) Line 2879 > python34.dll!fast_function(_object * func=0x0000000000000001, _object * * > * pp_stack=0x00000000f1380f98, int n=45993224, int na=1732453353, int > nk=0) Line 4332 > python34.dll!call_function(_object * * * pp_stack=0x00000000f166f2c9, int > oparg=131) Line 4260 > python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000d6922548, int > throwflag=0) Line 2838 > python34.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000002, _object * > globals=0x0000000000000002, _object * locals=0x0000000000000000, _object * > * args=0x0000000002334200, int argcount=2, _object * * > kws=0x0000000001c50060, int kwcount=0, _object * * defs=0x0000000000000000, > int defcount=0, _object * kwdefs=0x0000000000000000, _object * > closure=0x0000000000000000) Line 3585 + 0xa bytes > python34.dll!function_call(_object * func=0x0000000002bdcbf8, _object * > arg=0x00000000e41ef808, _object * kw=0x0000000000000000) Line 638 + 0x45 > bytes > python34.dll!PyObject_Call(_object * func=0x00000000e41ef808, _object * > arg=0x00000000ef4ad308, _object * kw=0x00000000deda7148) Line 2068 > python34.dll!ext_do_call(_object * func=0x0000000002bdcbf8, _object * * * > pp_stack=0x00000000f166f589, int flags=-280305184, int na=0, int nk=0) > Line 4558 + 0xe bytes > python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000ef4ae048, int > throwflag=-244910200) Line 2879 > python34.dll!fast_function(_object * func=0x0000000000000001, _object * * > * pp_stack=0x00000000ef4a7c50, int n=44825728, int na=1732453353, int > nk=0) Line 4332 > python34.dll!call_function(_object * * * pp_stack=0x00000000f166f779, int > oparg=131) Line 4260 > python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000db168948, int > throwflag=-244909704) Line 2838 > python34.dll!fast_function(_object * func=0x0000000000000001, _object * * > * pp_stack=0x00000000ef4a7c50, int n=44826272, int na=1732453353, int > nk=0) Line 4332 > python34.dll!call_function(_object * * * pp_stack=0x00000000f166f969, int > oparg=131) Line 4260 > python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000ef4ab418, int > throwflag=0) Line 2838 > python34.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000001, _object * > globals=0x0000000000000001, _object * locals=0x0000000000000000, _object * > * args=0x0000000000000000, int argcount=1, _object * * > kws=0x0000000000000000, int kwcount=0, _object * * defs=0x0000000000000000, > int defcount=0, _object * kwdefs=0x0000000000000000, _object * > closure=0x0000000000000000) Line 3585 + 0xa bytes > python34.dll!function_call(_object * func=0x0000000002abfd08, _object * > arg=0x00000000e22ebef0, _object * kw=0x0000000000000000) Line 638 + 0x45 > bytes > python34.dll!PyObject_Call(_object * func=0x00000000e22ebef0, _object * > arg=0x0000000000000000, _object * kw=0x0000000001c50048) Line 2068 > python34.dll!method_call(_object * func=0x0000000002d36148, _object * > arg=0x0000000001c50048, _object * kw=0x0000000000000000) Line 348 > python34.dll!PyObject_Call(_object * func=0x0000000001c50048, _object * > arg=0x0000000000000000, _object * kw=0x0000000002d36148) Line 2068 > python34.dll!PyEval_CallObjectWithKeywords(_object * > func=0x00000000c0a69f40, _object * arg=0x00000000673b1100, _object * > kw=0x0000000000000000) Line 4112 > python34.dll!t_bootstrap(void * boot_raw=0x00000000dacc7d00) Line 1000 + > 0x17 bytes > python34.dll!bootstrap(void * call=0x00000000dacc7d00) Line 177 > msvcr100.dll!_callthreadstartex() Line 314 + 0xd bytes > msvcr100.dll!_threadstartex(void * ptd=0x0000000000000000) Line 292 + 0x5 > bytes > kernel32.dll!0000000076eb5a4d() > [Frames below may be incorrect and/or missing, no symbols loaded for > kernel32.dll] > ntdll.dll!00000000775aba01() > > ---------- > Added file: http://bugs.python.org/file38478/gc_next.png > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue23187> > _______________________________________ > |
|||
msg238074 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-03-14 08:29 | |
It looks like you are running your app on Windows. Are you using the proactor event loop? I guess yes since you have more than 1000 clients and select() is limited to 500 sockets. In Python 3.4.3, I fixed a *lot* of crashes and race conditions in the proactor event loop. There are maybe more race conditions. Are you running subprocesses with asyncio? Le 14 mars 2015 05:42, "Guido van Rossum" <report@bugs.python.org> a écrit : > PyObject_GC_Malloc(unsigned __int64 basicsize=4046951880) > > That's nearly 4 GB. I somehow doubt your app is actually trying to allocate > that much memory The caller is PyList_New(0). When using a debugger on an optimized build, you should not trust the debugger. For example, variables can be stored in registers. Displaying values in the old frames can just display junk. In gdb, you see <optimized>. The fast call convention uses registers to pass first parameters to functions. I'm not sure that debuggers handle this case correctly. Etc. |
|||
msg238082 - (view) | Author: Michael Goldish (michaelg) | Date: 2015-03-14 12:25 | |
> That's nearly 4 GB. I somehow doubt your app is actually trying to allocate that much memory There's no reason for the app to allocate that much memory in a single call. It's using almost 4 GB of memory in total today, but that makes sense given its load today. > Is there any you can run an unoptimized build? I'll try doing that now. I suppose I should build it myself with Py_DEBUG enabled. |
|||
msg238083 - (view) | Author: Michael Goldish (michaelg) | Date: 2015-03-14 13:02 | |
> It looks like you are running your app on Windows. Are you using the proactor event loop? Yes. > In Python 3.4.3, I fixed a *lot* of crashes and race conditions in the proactor event loop. There are maybe more race conditions. I've already tried Python 3.4.3 and got the same crash. The one described in my long message with the stack trace is from 3.4.2, but I got that same stack trace (or a very similar one) with 3.4.3. I have a core dump from that crash as well so I can provide the exact stack trace if necessary. > Are you running subprocesses with asyncio? I can't recall running any subprocesses in the server program. The clients do that very often, but they're not the ones crashing... > The caller is PyList_New(0). I suppose this creates an empty list? I managed to get a stack trace from faulthandler somehow and it points to this line in one of the threads: d = rg.item_counts_by_params.setdefault(params, {}) This runs in a thread (managed by a ThreadPoolExecutor) that computes stuff and writes it to a file. It's in a loop that runs ~200000 times per job. Several lines above there's a similar line that passes an empty list as the 2nd parameter to setdefault(). The other threads are all in threading.wait(), probably doing nothing interesting. One thread is in _run_once(). Unfortunately I don't have any additional information about this particular crash (no dump or stack trace), but it's not unreasonable to assume other crashes occurred during creation of new lists/dicts in a similar manner. |
|||
msg238139 - (view) | Author: Michael Goldish (michaelg) | Date: 2015-03-15 10:13 | |
OK, I caught the crash with a debug build of Python 3.4.3. I have a core dump and even the process itself still alive in memory. I can provide any information you need. I can also explain how to debug a core dump with Visual Studio, if necessary. This time the crash was here: static void update_refs(PyGC_Head *containers) { PyGC_Head *gc = containers->gc.gc_next; for (; gc != containers; gc = gc->gc.gc_next) { assert(_PyGCHead_REFS(gc) == GC_REACHABLE); _PyGCHead_SET_REFS(gc, Py_REFCNT(FROM_GC(gc))); /* Python's cyclic gc should never see an incoming refcount * of 0: if something decref'ed to 0, it should have been * deallocated immediately at that time. * Possible cause (if the assert triggers): a tp_dealloc * routine left a gc-aware object tracked during its teardown * phase, and did something-- or allowed something to happen -- * that called back into Python. gc can trigger then, and may * see the still-tracked dying object. Before this assert * was added, such mistakes went on to allow gc to try to * delete the object again. In a debug build, that caused * a mysterious segfault, when _Py_ForgetReference tried * to remove the object from the doubly-linked list of all * objects a second time. In a release build, an actual * double deallocation occurred, which leads to corruption * of the allocator's internal bookkeeping pointers. That's * so serious that maybe this should be a release-build * check instead of an assert? */ ------> assert(_PyGCHead_REFS(gc) != 0); <------- crash } } In the calling frame I can see that update_refs() is called with a PyGC_Head object whose gc_refs is 0, and according to the paragraph above that shouldn't happen. A screenshot is attached. Stack trace: msvcr100d.dll!_NMSG_WRITE(int rterrnum=10) Line 217 msvcr100d.dll!abort() Line 71 msvcr100d.dll!_wassert(const wchar_t * expr=0x0000000063220618, const wchar_t * filename=0x00000000632205e0, unsigned int lineno=364) Line 153 python34_d.dll!update_refs(_gc_head * containers=0x000000006338cfc0) Line 364 + 0x2b bytes python34_d.dll!collect(int generation=0, __int64 * n_collected=0x000000012beee530, __int64 * n_uncollectable=0x000000012beee520, int nofail=0) Line 969 python34_d.dll!collect_with_callback(int generation=0) Line 1140 + 0x16 bytes python34_d.dll!collect_generations() Line 1163 + 0x9 bytes python34_d.dll!_PyObject_GC_Malloc(unsigned __int64 basicsize=56) Line 1739 python34_d.dll!_PyObject_GC_New(_typeobject * tp=0x00000000633ce280) Line 1748 + 0xe bytes python34_d.dll!PyList_New(__int64 size=0) Line 159 + 0xc bytes python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000010f387658, int throwflag=3) Line 2347 + 0x8 bytes python34_d.dll!fast_function(_object * func=0x0000000000000003, _object * * * pp_stack=0x0000000129e69948, int n=737077536, int na=1661137695, int nk=0) Line 4335 python34_d.dll!call_function(_object * * * pp_stack=0x000000012beee919, int oparg=131) Line 4263 python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000011c077528, int throwflag=3) Line 2840 python34_d.dll!fast_function(_object * func=0x0000000000000003, _object * * * pp_stack=0x0000000089a53190, int n=737078048, int na=1661137695, int nk=0) Line 4335 python34_d.dll!call_function(_object * * * pp_stack=0x000000012beeeb19, int oparg=131) Line 4263 python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000010369f2a8, int throwflag=0) Line 2840 python34_d.dll!PyEval_EvalCodeEx(_object * _co=0x0000000100000002, _object * globals=0x0000000100000002, _object * locals=0x000000001fa5e218, _object * * args=0x0000000000730088, int argcount=2, _object * * kws=0x0000000000730080, int kwcount=0, _object * * defs=0x0000000000000000, int defcount=0, _object * kwdefs=0x0000000000000000, _object * closure=0x0000000000000000) Line 3588 + 0xf bytes python34_d.dll!function_call(_object * func=0x0000000003c10058, _object * arg=0x000000001fa5e218, _object * kw=0x00000001296360c8) Line 638 + 0x8a bytes python34_d.dll!PyObject_Call(_object * func=0x0000000003c10058, _object * arg=0x000000001fa5e218, _object * kw=0x00000001296360c8) Line 2040 + 0x13 bytes python34_d.dll!ext_do_call(_object * func=0x00000000c42c25a0, _object * * * pp_stack=0x000000012beeee49, int flags=62980184, int na=1, int nk=0) Line 4561 + 0xe bytes python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x00000000fa4eab18, int throwflag=1) Line 2880 python34_d.dll!fast_function(_object * func=0x0000000000000001, _object * * * pp_stack=0x000000003b42f398, int n=737079376, int na=1661137695, int nk=0) Line 4335 python34_d.dll!call_function(_object * * * pp_stack=0x000000012beef049, int oparg=131) Line 4263 python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000010f38ec28, int throwflag=0) Line 2840 python34_d.dll!PyEval_EvalCodeEx(_object * _co=0x0000000100000002, _object * globals=0x0000000100000002, _object * locals=0x000000012a1e5d78, _object * * args=0x0000000000730088, int argcount=2, _object * * kws=0x0000000000730080, int kwcount=0, _object * * defs=0x0000000000000000, int defcount=0, _object * kwdefs=0x0000000000000000, _object * closure=0x0000000000000000) Line 3588 + 0xf bytes python34_d.dll!function_call(_object * func=0x00000000036d2d48, _object * arg=0x000000012a1e5d78, _object * kw=0x000000012a1f1918) Line 638 + 0x8a bytes python34_d.dll!PyObject_Call(_object * func=0x00000000036d2d48, _object * arg=0x000000012a1e5d78, _object * kw=0x000000012a1f1918) Line 2040 + 0x13 bytes python34_d.dll!ext_do_call(_object * func=0x00000000033f6040, _object * * * pp_stack=0x000000012beef379, int flags=57486664, int na=0, int nk=0) Line 4561 + 0xe bytes python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000010f80f118, int throwflag=1) Line 2880 python34_d.dll!fast_function(_object * func=0x0000000000000001, _object * * * pp_stack=0x000000012a1ee878, int n=737080704, int na=1661137695, int nk=0) Line 4335 python34_d.dll!call_function(_object * * * pp_stack=0x000000012beef579, int oparg=131) Line 4263 python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000010f1c8b98, int throwflag=1) Line 2840 python34_d.dll!fast_function(_object * func=0x0000000000000001, _object * * * pp_stack=0x000000012a1ee878, int n=737081216, int na=1661137695, int nk=0) Line 4335 python34_d.dll!call_function(_object * * * pp_stack=0x000000012beef779, int oparg=131) Line 4263 python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000010f80ee88, int throwflag=0) Line 2840 python34_d.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000001, _object * globals=0x0000000100000001, _object * locals=0x0000000000730058, _object * * args=0x0000000000000000, int argcount=1, _object * * kws=0x0000000000000000, int kwcount=0, _object * * defs=0x0000000000000000, int defcount=0, _object * kwdefs=0x0000000000000000, _object * closure=0x0000000000000000) Line 3588 + 0xf bytes python34_d.dll!function_call(_object * func=0x0000000003404058, _object * arg=0x000000012a1ee058, _object * kw=0x0000000000000000) Line 638 + 0x8a bytes python34_d.dll!PyObject_Call(_object * func=0x0000000003404058, _object * arg=0x000000012a1ee058, _object * kw=0x0000000000000000) Line 2040 + 0x13 bytes python34_d.dll!method_call(_object * func=0x0000000003404058, _object * arg=0x000000012a1ee058, _object * kw=0x0000000000000000) Line 347 + 0x1d bytes python34_d.dll!PyObject_Call(_object * func=0x0000000003826fa8, _object * arg=0x0000000000730058, _object * kw=0x0000000000000000) Line 2040 + 0x13 bytes python34_d.dll!PyEval_CallObjectWithKeywords(_object * func=0x0000000000000000, _object * arg=0x0000000000000000, _object * kw=0x0000000000000000) Line 4115 python34_d.dll!t_bootstrap(void * boot_raw=0x00000000e2e0df70) Line 1000 + 0x20 bytes python34_d.dll!bootstrap(void * call=0x000000000070ef20) Line 176 msvcr100d.dll!_callthreadstartex() Line 314 + 0x17 bytes msvcr100d.dll!_threadstartex(void * ptd=0x000000007be2d110) Line 297 kernel32.dll!0000000076eb5a4d() [Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll] ntdll.dll!00000000775aba01() |
|||
msg238147 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-03-15 15:02 | |
(Since you chose to comment this closed issue instead of opening a new one, I reopen this issue.) |
|||
msg238150 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-03-15 15:25 | |
>> Did you see some errors or warnings when running your application with asyncio in debug mode? > No, but I'll try. I doubt the problem is in asyncio itself because it's mostly written in Python. This looks like a garbage collection issue. Sorry, you're wrong: the proactor event loop heavily uses the _overlapped module which is implemented in C. A crash in the garbage collector is more likely a bug in asyncio/your application, than a bug in Python itself. Please enable asyncio debug mode, ResourceWarning, enable DEBUG log level, and check if you get some errors. For example, "GetQueuedCompletionStatus() returned an unexpected event" usually means that you will quickly get a memory corruption... If possible, write all asyncio logs at DEBUG log level and attach them to the issue. Using the logging module, it's possible to add a custom handler for "asyncio". Example: --- import asyncio import logging logging.basicConfig() logger = asyncio.log.logger output = open("asyncio.log", "a") handler = logging.StreamHandler(output) handler.setLevel(logging.DEBUG) logger.setLevel(logging.DEBUG) logger.addHandler(handler) asyncio.log.logger.debug("plop") output.close() --- > python34_d.dll!PyList_New(__int64 size=0) Line 159 + 0xc bytes So the crash occurred again during a call to PyList_New(0) which triggerred a garbage collection. The backtrace looks more reliable: _PyObject_GC_Malloc(56). > msvcr100d.dll!_wassert(const wchar_t * expr=0x0000000063220618, const wchar_t * filename=0x00000000632205e0, unsigned int lineno=364) Line 153 > python34_d.dll!update_refs(_gc_head * containers=0x000000006338cfc0) Line 364 + 0x2b bytes This assertion contains a long comment which explains when the assertion can fail. /* Python's cyclic gc should never see an incoming refcount * of 0: if something decref'ed to 0, it should have been * deallocated immediately at that time. * Possible cause (if the assert triggers): a tp_dealloc * routine left a gc-aware object tracked during its teardown * phase, and did something-- or allowed something to happen -- * that called back into Python. gc can trigger then, and may * see the still-tracked dying object. Before this assert * was added, such mistakes went on to allow gc to try to * delete the object again. In a debug build, that caused * a mysterious segfault, when _Py_ForgetReference tried * to remove the object from the doubly-linked list of all * objects a second time. In a release build, an actual * double deallocation occurred, which leads to corruption * of the allocator's internal bookkeeping pointers. That's * so serious that maybe this should be a release-build * check instead of an assert? */ assert(_PyGCHead_REFS(gc) != 0); To come back to asyncio and the proactor event loop: the implementations of the proactor event loop is still young and probably unstable. Would it be possible to test again with fewer clients (500 clients or less to not hit the select() limit on Windows) with selector event loop? A crash in the garbage collector can be caused by a memory corruption. I fixed a lot of memory corruption caused by internal objects uesd in the proactor event loop. Can you check if asyncio.windows_events._BaseWaitHandleFuture is used? This class is used by IocpProactor.wait_for_handle() which is called by _WindowsSubprocessTransport. But you wrote you the server (where the crash occurs) doesn't run subprocesses. I never tried the proactor event loop with threads. Are you using asyncio code outside the main thread? asyncio is not thread-safe at all :-p Are you using a single event loop or one event loop per thread? |
|||
msg238151 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-03-15 15:26 | |
Bugfixes of asyncio 3.4.3, a lot of them are related to the proactor event loop (and most of them leaded to crashes): https://code.google.com/p/tulip/source/browse/ChangeLog#66 |
|||
msg238155 - (view) | Author: Michael Goldish (michaelg) | Date: 2015-03-15 17:11 | |
> Sorry, you're wrong: the proactor event loop heavily uses the _overlapped module which is implemented in C. A crash in the garbage collector is more likely a bug in asyncio/your application, than a bug in Python itself. I'm aware of that. I assumed the original crash reported in this thread was the same as mine but I can't be sure of that. And that crash was on Linux, where asyncio is pure Python AFAIK. > Please enable asyncio debug mode, ResourceWarning, enable DEBUG log level, and check if you get some errors. For example, "GetQueuedCompletionStatus() returned an unexpected event" usually means that you will quickly get a memory corruption... I've already done that but admittedly not for very long because the server ran so slowly that it was unresponsive. I didn't see any error or warning messages, but I did see numerous "Executing <Handle _ProactorReadPipeTransport._loop_reading(<_OverlappedF...events.py:451>)" messages. > Would it be possible to test again with fewer clients (500 clients or less to not hit the select() limit on Windows) with selector event loop? I'll see if I can try that, but I guess reproducing the crash can take much longer with 500 clients. > Can you check if asyncio.windows_events._BaseWaitHandleFuture is used? This class is used by IocpProactor.wait_for_handle() which is called by _WindowsSubprocessTransport. But you wrote you the server (where the crash occurs) doesn't run subprocesses. It's not used. I put print statements and an exit() in the constructor of _BaseWaitHandleFuture and it wasn't triggered. > Are you using asyncio code outside the main thread? No. > asyncio is not thread-safe at all :-p Are you using a single event loop or one event loop per thread? A single event loop, and I've made sure that all code running outside the main thread doesn't call any asyncio functions. |
|||
msg245426 - (view) | Author: STINNER Victor (vstinner) * | Date: 2015-06-16 21:14 | |
I'm sorry, but even if it's a real bug, we don't have enough information nor any scenario to reproduce the bug, so it's not possible to investigate the bug. I close the issue. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:11 | admin | set | github: 67376 |
2015-06-16 21:14:41 | vstinner | set | status: open -> closed resolution: out of date messages: + msg245426 |
2015-03-15 17:11:49 | michaelg | set | messages: + msg238155 |
2015-03-15 15:26:31 | vstinner | set | messages: + msg238151 |
2015-03-15 15:25:52 | vstinner | set | messages: + msg238150 |
2015-03-15 15:02:51 | vstinner | set | status: closed -> open resolution: not a bug -> (no value) messages: + msg238147 |
2015-03-15 10:13:29 | michaelg | set | files:
+ gc_refs.png messages: + msg238139 |
2015-03-14 13:02:51 | michaelg | set | messages: + msg238083 |
2015-03-14 12:25:36 | michaelg | set | messages: + msg238082 |
2015-03-14 08:29:58 | vstinner | set | messages: + msg238074 |
2015-03-14 04:42:42 | gvanrossum | set | messages: + msg238072 |
2015-03-14 02:27:46 | michaelg | set | files:
+ gc_next.png messages: + msg238069 |
2015-03-14 01:41:39 | michaelg | set | messages: + msg238066 |
2015-03-14 00:48:23 | gvanrossum | set | messages: + msg238065 |
2015-03-13 22:40:12 | michaelg | set | messages: + msg238061 |
2015-03-10 11:05:18 | vstinner | set | messages: + msg237745 |
2015-03-10 10:44:41 | michaelg | set | nosy:
+ michaelg messages: + msg237742 |
2015-01-26 14:19:52 | vstinner | set | status: open -> closed resolution: not a bug messages: + msg234747 |
2015-01-14 01:25:16 | karamanolev | set | messages: + msg233989 |
2015-01-14 01:20:55 | vstinner | set | messages: + msg233987 |
2015-01-08 18:10:27 | r-englund | set | nosy:
+ r-englund |
2015-01-08 00:49:13 | vstinner | set | messages: + msg233616 |
2015-01-08 00:29:21 | karamanolev | set | messages: + msg233615 |
2015-01-08 00:18:45 | vstinner | set | messages: + msg233614 |
2015-01-08 00:13:08 | karamanolev | set | messages: + msg233613 |
2015-01-08 00:10:43 | vstinner | set | messages: + msg233612 |
2015-01-08 00:08:38 | karamanolev | set | messages: + msg233611 |
2015-01-08 00:03:58 | karamanolev | set | files:
+ Screenshot 2015-01-07 15.57.01.png messages: + msg233610 |
2015-01-07 23:48:14 | gvanrossum | set | messages: + msg233609 |
2015-01-07 23:41:35 | karamanolev | create |