classification
Title: Segmentation fault, possibly asyncio related
Type: crash Stage:
Components: asyncio, Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, haypo, karamanolev, michaelg, r-englund, yselivanov
Priority: normal Keywords:

Created on 2015-01-07 23:41 by karamanolev, last changed 2015-06-16 21:14 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
Screenshot 2015-01-07 15.57.01.png karamanolev, 2015-01-08 00:03 Munin memory usage screenshot
gc_next.png michaelg, 2015-03-14 02:27
gc_refs.png michaelg, 2015-03-15 10:13
Messages (28)
msg233608 - (view) Author: Ivailo Karamanolev (karamanolev) Date: 2015-01-07 23:41
I have a program that's running Tornado on top of asyncio and uses mysql-connector-python to monitor a set of HTTP servers and save the status to MySQL. Initially I ran it with the current version of Python in the Ubuntu 14.04 repos, that's 3.4.0 and started getting segmentation faults after keeping it running for a few days. I compiled and installed 3.4.2 and the same thing keeps happening, with the same stack trace (per gdb). Running it with and without gdb attached gives the same results - seg fault after 3-4 days of running.

The machine is running on a Xeon with ECC memory and doesn't show signs of illness otherwise, so it's probably not a hardware problem. Also, it's happening consistently when the programs runs for 3-4 days, with or without a system reboot.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000471ed7 in subtype_dealloc (self=0x7ffff16a3860) at Objects/typeobject.c:1128
1128            _PyObject_GC_UNTRACK(self);

Stack trace:
#0  0x0000000000471ed7 in subtype_dealloc (self=0x7ffff16a3860) at Objects/typeobject.c:1128
#1  0x00000000004563f4 in dict_dealloc (mp=0x7fffe95444c8) at Objects/dictobject.c:1383
#2  0x0000000000472140 in subtype_dealloc (self=0x7fffe8e1fcf8) at Objects/typeobject.c:1157
#3  0x000000000058ad22 in frame_dealloc (f=0x1145258) at Objects/frameobject.c:429
#4  0x0000000000588fd7 in gen_send_ex (gen=gen@entry=0x7fffe8c5cea0, gen=gen@entry=0x7fffe8c5cea0, exc=0, arg=arg@entry=0x0) at Objects/genobject.c:145
#5  _PyGen_Send (gen=gen@entry=0x7fffe8c5cea0, arg=arg@entry=0x87fec0 <_Py_FalseStruct>) at Objects/genobject.c:158
#6  0x00000000004da329 in PyEval_EvalFrameEx (f=f@entry=0x11faaf8, throwflag=throwflag@entry=0) at Python/ceval.c:1906
#7  0x0000000000588e5c in gen_send_ex (gen=gen@entry=0x7fffe8c5ccf0, gen=gen@entry=0x7fffe8c5ccf0, exc=0, arg=arg@entry=0x7ffff1bb6420) at Objects/genobject.c:104
#8  _PyGen_Send (gen=gen@entry=0x7fffe8c5ccf0, arg=arg@entry=0x87fec0 <_Py_FalseStruct>) at Objects/genobject.c:158
#9  0x00000000004da329 in PyEval_EvalFrameEx (f=f@entry=0x10daa48, throwflag=throwflag@entry=0) at Python/ceval.c:1906
#10 0x0000000000588e5c in gen_send_ex (gen=gen@entry=0x7fffe8c5cc18, gen=gen@entry=0x7fffe8c5cc18, exc=0, arg=arg@entry=0x102aac8) at Objects/genobject.c:104
#11 _PyGen_Send (gen=gen@entry=0x7fffe8c5cc18, arg=arg@entry=0x87fec0 <_Py_FalseStruct>) at Objects/genobject.c:158
#12 0x00000000004da329 in PyEval_EvalFrameEx (f=f@entry=0x10da7f8, throwflag=throwflag@entry=0) at Python/ceval.c:1906
#13 0x0000000000588e5c in gen_send_ex (gen=0x7ffff16b65a0, gen=0x7ffff16b65a0, exc=0, arg=<optimized out>) at Objects/genobject.c:104
#14 _PyGen_Send (gen=0x7ffff16b65a0, arg=<optimized out>) at Objects/genobject.c:158
#15 0x00000000004d9d0b in call_function (oparg=<optimized out>, pp_stack=0x7fffffffd290) at Python/ceval.c:4221
#16 PyEval_EvalFrameEx (f=f@entry=0x10da568, throwflag=throwflag@entry=0) at Python/ceval.c:2836
#17 0x00000000004d28d4 in PyEval_EvalCodeEx (_co=0x7ffff1e0c030, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=3, kws=0x12e38b8, kwcount=0,
    defs=0x7ffff1e0a8e0, defcount=2, kwdefs=0x0, closure=0x7ffff1e10588) at Python/ceval.c:3585
#18 0x00000000004d8e75 in fast_function (nk=<optimized out>, na=3, n=3, pp_stack=0x7fffffffd490, func=0x7ffff1e122f0) at Python/ceval.c:4341
#19 call_function (oparg=<optimized out>, pp_stack=0x7fffffffd490) at Python/ceval.c:4259
#20 PyEval_EvalFrameEx (f=f@entry=0x12e3708, throwflag=throwflag@entry=0) at Python/ceval.c:2836
#21 0x00000000004d28d4 in PyEval_EvalCodeEx (_co=0x7ffff1e0c0c0, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x7fffeae8a720, argcount=2, kws=kws@entry=0x0,
    kwcount=kwcount@entry=0, defs=defs@entry=0x0, defcount=defcount@entry=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:3585
#22 0x000000000058c67c in function_call (func=0x7ffff1e12378, arg=0x7fffeae8a708, kw=0x0) at Objects/funcobject.c:632
#23 0x000000000042686a in PyObject_Call (func=func@entry=0x7ffff1e12378, arg=arg@entry=0x7fffeae8a708, kw=kw@entry=0x0) at Objects/abstract.c:2067
#24 0x00000000004d48a4 in ext_do_call (nk=<optimized out>, na=1, flags=<optimized out>, pp_stack=0x7fffffffd760, func=0x7ffff1e12378) at Python/ceval.c:4558
#25 PyEval_EvalFrameEx (f=f@entry=0x145ac48, throwflag=throwflag@entry=0) at Python/ceval.c:2876
#26 0x00000000004d9e82 in fast_function (nk=<optimized out>, na=1, n=1, pp_stack=0x7fffffffd8a0, func=0x7ffff1e2d488) at Python/ceval.c:4331
#27 call_function (oparg=<optimized out>, pp_stack=0x7fffffffd8a0) at Python/ceval.c:4259
#28 PyEval_EvalFrameEx (f=f@entry=0x11aee28, throwflag=throwflag@entry=0) at Python/ceval.c:2836
#29 0x00000000004d9e82 in fast_function (nk=<optimized out>, na=1, n=1, pp_stack=0x7fffffffd9f0, func=0x7ffff1bc1400) at Python/ceval.c:4331
#30 call_function (oparg=<optimized out>, pp_stack=0x7fffffffd9f0) at Python/ceval.c:4259
#31 PyEval_EvalFrameEx (f=f@entry=0x10d9c68, throwflag=throwflag@entry=0) at Python/ceval.c:2836
#32 0x00000000004d9e82 in fast_function (nk=<optimized out>, na=1, n=1, pp_stack=0x7fffffffdb40, func=0x7ffff1bc0268) at Python/ceval.c:4331
#33 call_function (oparg=<optimized out>, pp_stack=0x7fffffffdb40) at Python/ceval.c:4259
#34 PyEval_EvalFrameEx (f=f@entry=0xf9ed08, throwflag=throwflag@entry=0) at Python/ceval.c:2836
#35 0x00000000004d9e82 in fast_function (nk=<optimized out>, na=0, n=0, pp_stack=0x7fffffffdc90, func=0x7ffff16a4d08) at Python/ceval.c:4331
#36 call_function (oparg=<optimized out>, pp_stack=0x7fffffffdc90) at Python/ceval.c:4259
#37 PyEval_EvalFrameEx (f=f@entry=0x7ffff7f4b438, throwflag=throwflag@entry=0) at Python/ceval.c:2836
#38 0x00000000004dae2a in PyEval_EvalCodeEx (closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0, locals=locals@entry=0x7ffff7f95d50,
    globals=globals@entry=0x100000000, _co=0x7ffff7ee0150, _co@entry=0x8ac460) at Python/ceval.c:3585
#39 PyEval_EvalCode (co=co@entry=0x7ffff7ee0150, globals=globals@entry=0x7ffff7f48408, locals=locals@entry=0x7ffff7f48408) at Python/ceval.c:773
#40 0x000000000050a8f8 in run_mod (arena=0x95d9a0, flags=0x7fffffffdee0, locals=0x7ffff7f48408, globals=0x7ffff7f48408, filename=0x7ffff7e3f540, mod=0x8fb1e0)
    at Python/pythonrun.c:2180
#41 PyRun_FileExFlags (fp=0x8fa9c0, filename_str=<optimized out>, start=<optimized out>, globals=0x7ffff7f48408, locals=0x7ffff7f48408, closeit=1, flags=0x7fffffffdee0)
    at Python/pythonrun.c:2133
#42 0x000000000050c8f5 in PyRun_SimpleFileExFlags (fp=0x8fa9c0, filename=<optimized out>, closeit=1, flags=0x7fffffffdee0) at Python/pythonrun.c:1606
#43 0x000000000041edc4 in run_file (p_cf=0x7fffffffdee0, filename=0x8ac2a0 L"./torrent_manager.py", fp=0x8fa9c0) at Modules/main.c:319
#44 Py_Main (argc=argc@entry=2, argv=argv@entry=0x8ab010) at Modules/main.c:751
#45 0x000000000041b227 in main (argc=2, argv=<optimized out>) at ./Modules/python.c:69
msg233609 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-01-07 23:48
That's going to be really hard to debug. Is it always the same stack trace? Could it be slowly running out of memory? (The consistent time until it happens points to this.) A segfault in dealloc might point to a refcounting bug. The rarity of the even might mean that it only happens in some rare case (maybe the refcount bug is in an error path that is only triggered by running out of memory somewhere?). Is mysql totally unsuspected?
msg233610 - (view) Author: Ivailo Karamanolev (karamanolev) Date: 2015-01-08 00:03
Running out of memory is unlikely. There are many other processes running on the machine and none of them show signs of memory issues. In addition, the munin memory usage reporting doesn't show signs of steady growing or drops when the process crashes (munin screenshot attached).

3.4.0 and 3.4.2 on all runs showed exactly the same stack trace.

AFAIK mysql-connector-python is pure Python, so even if the problem was there, I think it's still Python core. I'm not 100% on this though.
msg233611 - (view) Author: Ivailo Karamanolev (karamanolev) Date: 2015-01-08 00:08
What I forgot to mention is that I'm using aiohttp for making lots of HTTP requests. http://bugs.python.org/issue21435 mentions asyncio and aiohttp and is somewhat memory management related so I thought I'm hitting a different manifestation of it and upgrading to 3.4.2 would fix it, but it didn't. I don't know enough about cpython to know if they are related.
msg233612 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-01-08 00:10
> that's 3.4.0

I suggest you to upgrade to Python 3.4.2. Your issue may already be fixed. For example, there was the issue #21435 in the garbage collector. The issue was found with an asyncio code.

Your crash occurs at "_PyObject_GC_UNTRACK(self);" which is also related to the garbage collector.
msg233613 - (view) Author: Ivailo Karamanolev (karamanolev) Date: 2015-01-08 00:13
@haypo

Maybe I didn't put it prominently enough, but I upgraded to 3.4.2 by downloading the source and installing it. The same crash with the same stack trace happens.
msg233614 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-01-08 00:18
> upgrading to 3.4.2 would fix it, but it didn't

Oh sorry, I missed the part where you wrote that you already tested Python 3.4.2.

By the way, if you already compile your own Python: try to configure Python with ./configure --with-pydebug. It adds much more debug checks to check internal consistency.

> Is mysql totally unsuspected?

It looks like mysql-connector-python is written in pure Python. Are you using other external modules which are implemented in C?

--

Would it be possible to share your source code to try to reproduce the issue? You might send it to me privately if you prefer.
msg233615 - (view) Author: Ivailo Karamanolev (karamanolev) Date: 2015-01-08 00:29
> try to configure Python with ./configure --with-pydebug
I'm installing 3.4.2 with this option right now and I'll reboot and run the script with gdb attached. I might be a few days before I have a result.

> Are you using other external modules which are implemented in C
The only module that I'm using that's implemented in C is ujson https://pypi.python.org/pypi/ujson . If you think this is relevant, I can easily try changing it to use the built-in json.

> Would it be possible to share your source code
I can share it privately, yes, but running it on your machine might be too complicated. Sharing the source code will probably be only useful for you to look at what libraries I am using and how I'm using asyncio.

What I can do is try running it on another machine with another setup. While it is currently running on a physical machine that I set up, I can run it in parallel on a OpenVZ VPS that is installed by the VPS provider, so it should give a completely independent environment. Would that help? If yes, should I also install 3.4.2 from source with --with-pydebug?
msg233616 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-01-08 00:49
> The only module that I'm using that's implemented in C is ujson https://pypi.python.org/pypi/ujson . If you think this is relevant, I can easily try changing it to use the built-in json.

To investigate an issue, we have to reduce the scope. If your program works with json, please use it instead yes. If the program still crash with the builtin json, it means that usjon is not the problem. If the program doesn't crash anymore, your issue may come from ujson and the investigate would be completly different...
msg233987 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-01-14 01:20
@Ivailo: Any progress on your investigation?
msg233989 - (view) Author: Ivailo Karamanolev (karamanolev) Date: 2015-01-14 01:25
@Victor It has been running for 5 days now on 3.4.2 using json instead of ujson. I want to give at least 10 days to crash and if it's still up, I plan to switch to simplejson to see how that will go, since json is quite slow. I'll keep playing with using different modules and when/if I have something significant that I can reproduce - I'll post here. If the problems are resolved, I'll close the ticket. If that doesn't sound reasonable, please let me know.
msg234747 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-01-26 14:19
Since the issue looks to be related to a third party module (ujson), I close this issue.

Note: On UNIX, asyncio is fully implemented in Python, I don't see how it could corrupt internal Python structures.
msg237742 - (view) Author: Michael Goldish (michaelg) Date: 2015-03-10 10:44
I'm not sure where to post this as the issue is closed:

I've encountered the same problem. I don't have a stack trace to share at the moment but I know it's almost (or entirely) identical.

My context:
- Windows
- Happens with Python 3.4.2 and 3.4.3 (and possibly earlier versions)
- Pure Python (no external binary modules whatsoever)
- Pure asyncio (no Tornado)
- Not using json/ujson
- Plenty of memory available

The crash seems to occur at random, with anything between 1 and 20 days separating consecutive crashes.

I have core dumps if you're interested but am not sure where to upload them.
msg237745 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-03-10 11:05
> I'm not sure where to post this as the issue is closed:

Please open a new issue.

> I've encountered the same problem.

Why do you consider that it's the same issue?

> The crash seems to occur at random, with anything between 1 and 20 days separating consecutive crashes.

Outch. I hate such random bugs.

> I have core dumps if you're interested but am not sure where to upload them.

I don't know how to analyze a Windows core dump.

Sorry, but without more information, it will be almost impossible to fix your issue. You can try to use faulthandler to write the traceback of all Python threads where the crash occurred:
https://docs.python.org/dev/library/faulthandler.html#faulthandler.enable

Write the ouput into a file. The output is text.


Brian Curtin wrote a similar tool but more specific to Windows:
https://bitbucket.org/briancurtin/minidumper/

"minidumper writes "minidumps" for post-mortem analysis of crashes in Python or its extensions."

Problem: I don't know how to use such dump!? I guess that Visual Studio can load such minidump files.
https://msdn.microsoft.com/en-us/library/windows/desktop/ee416349%28v=vs.85%29.aspx#analyzing_a_minidump


Did you see some errors or warnings when running your application with asyncio in debug mode?
https://docs.python.org/dev/library/asyncio-dev.html

Please use at least Python 3.4.3 which embeds a recent version of asyncio, it enables more checks in debug mode.


Can you maybe send me the source code of your application?
msg238061 - (view) Author: Michael Goldish (michaelg) Date: 2015-03-13 22:40
> Why do you consider that it's the same issue?

Because it's a very similar stack trace, the crash itself is in subtype_dealloc(), it happens once every few days, asyncio is involved, and the issue is relatively new - I couldn't find another bug report with a similar description.

> I have core dumps if you're interested but am not sure where to upload them.

> You can try to use faulthandler to write the traceback of all Python threads where the crash occurred

I've enabled faulthandler, the program crashed, and nothing was written to the file. For reference, I caused an artificial crash in the same Python program (using ctypes) and the stack traces *were* written. I verified this several times, both with the real crash and the artificial one, and the real crash always left the file empty.

> I guess that Visual Studio can load such minidump files.

I've already looked at one of my dumps in Visual Studio but couldn't make sense of what was going on. The structures involved seemed to contain data that made no sense to me (e.g. some pointers had values like 0x10, which usually makes no sense for a pointer and seems to indicate memory corruption).

> Did you see some errors or warnings when running your application with asyncio in debug mode?

No, but I'll try. I doubt the problem is in asyncio itself because it's mostly written in Python. This looks like a garbage collection issue.

> Can you maybe send me the source code of your application?

I'll ask to see if that's permissible, but either way you will probably have a hard time reproducing the crash - the program is a server which maintains active TCP connections to more than 1000 client instances, writes large text files in background threads, pickles large objects... The frequency of the crash seems to be correlated to how busy the server is. Without a proper setup it'll be difficult to make it busy.
msg238065 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-03-14 00:48
Are we all in agreement that it's probably somehow running out of memory?
msg238066 - (view) Author: Michael Goldish (michaelg) Date: 2015-03-14 01:41
I don't see a reason to assume the machine was running out of memory. After each crash the process was kept alive by one of those Windows crash dialogs - "the process terminated unexpectedly" or similar. I could see exactly how much memory was occupied by the process and it was never a significant percentage of the machine's total memory. It sometimes exceeded 2GB but I'm using a 64 bit build so this shouldn't be a problem.
msg238069 - (view) Author: Michael Goldish (michaelg) Date: 2015-03-14 02:27
I caught another crash just now, this time in update_refs(). A stack trace is attached below. I still think this is the same issue.

static void
update_refs(PyGC_Head *containers)
{
    PyGC_Head *gc = containers->gc.gc_next;
0000000067382D60  mov         rdx,qword ptr [rcx]  
    for (; gc != containers; gc = gc->gc.gc_next) {
0000000067382D63  cmp         rdx,rcx  
0000000067382D66  je          update_refs+28h (67382D88h)  
0000000067382D68  nop         dword ptr [rax+rax]  
        assert(_PyGCHead_REFS(gc) == GC_REACHABLE);
        _PyGCHead_SET_REFS(gc, Py_REFCNT(FROM_GC(gc)));
0000000067382D70  and         qword ptr [rdx+10h],1  

Here rdx is 0, which means we're writing to 0x10, probably because the member gc_refs is at offset 0x10 of struct PyGC_Head.gc. So I suppose containers->gc.gc_next was NULL.

(In case this is helpful: somehow, possibly due to compiler optimizations, Visual Studio claims that containers->gc.gc_next is 0x34. I'm not sure what to make of this. It also claims that containers->gc.gc_prev->gc.gc_next is 0x3e, and that containers->gc.gc_prev->gc.gc_prev->gc.gc_next is 0x3e, and so on... gc_prev always seems fine and gc_next is always 0x3e, except for the first one which is 0x34. I'm attaching a screenshot to make this clearer.)

Stack trace (Python 3.4.2, 64 bit, Windows):

python34.dll!update_refs(_gc_head * containers=0x00000000676af8e0)  Line 345
python34.dll!collect(int generation=-290088656, __int64 * n_collected=0x00000000f166e920, __int64 * n_uncollectable=0x0000000000000000, int nofail=0)  Line 969
python34.dll!collect_with_callback(int generation=-290088656)  Line 1141
python34.dll!_PyObject_GC_Malloc(unsigned __int64 basicsize=4046951880)  Line 1739
python34.dll!_PyObject_GC_New(_typeobject * tp=0x0000000001c624f0)  Line 1749
python34.dll!PyList_New(__int64 size=0)  Line 159 + 0xc bytes
python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000bab65b08, int throwflag=-244913096)  Line 2346
python34.dll!fast_function(_object * func=0x0000000000000003, _object * * * pp_stack=0x00000000f77684e0, int n=102445400, int na=1732453353, int nk=0)  Line 4332
python34.dll!call_function(_object * * * pp_stack=0x00000000f166ec29, int oparg=131)  Line 4260
python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000601cbd68, int throwflag=-244912600)  Line 2838
python34.dll!fast_function(_object * func=0x0000000000000003, _object * * * pp_stack=0x00000000f7768f28, int n=56017240, int na=1732453353, int nk=0)  Line 4332
python34.dll!call_function(_object * * * pp_stack=0x00000000f166ee19, int oparg=131)  Line 4260
python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000c65ff8f8, int throwflag=0)  Line 2838
python34.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000002, _object * globals=0x0000000000000002, _object * locals=0x0000000000000000, _object * * args=0x000000000358d248, int argcount=2, _object * * kws=0x0000000001c50060, int kwcount=0, _object * * defs=0x0000000000000000, int defcount=0, _object * kwdefs=0x0000000000000000, _object * closure=0x0000000000000000)  Line 3585 + 0xa bytes
python34.dll!function_call(_object * func=0x000000000355f048, _object * arg=0x00000000f12f7688, _object * kw=0x0000000000000000)  Line 638 + 0x45 bytes
python34.dll!PyObject_Call(_object * func=0x00000000f12f7688, _object * arg=0x00000000f11c4d08, _object * kw=0x00000000f4e9ba58)  Line 2068
python34.dll!ext_do_call(_object * func=0x000000000355f048, _object * * * pp_stack=0x00000000f166f0d9, int flags=-200649216, int na=1, int nk=0)  Line 4558 + 0xe bytes
python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000f40a5458, int throwflag=-244911400)  Line 2879
python34.dll!fast_function(_object * func=0x0000000000000001, _object * * * pp_stack=0x00000000f1380f98, int n=45993224, int na=1732453353, int nk=0)  Line 4332
python34.dll!call_function(_object * * * pp_stack=0x00000000f166f2c9, int oparg=131)  Line 4260
python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000d6922548, int throwflag=0)  Line 2838
python34.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000002, _object * globals=0x0000000000000002, _object * locals=0x0000000000000000, _object * * args=0x0000000002334200, int argcount=2, _object * * kws=0x0000000001c50060, int kwcount=0, _object * * defs=0x0000000000000000, int defcount=0, _object * kwdefs=0x0000000000000000, _object * closure=0x0000000000000000)  Line 3585 + 0xa bytes
python34.dll!function_call(_object * func=0x0000000002bdcbf8, _object * arg=0x00000000e41ef808, _object * kw=0x0000000000000000)  Line 638 + 0x45 bytes
python34.dll!PyObject_Call(_object * func=0x00000000e41ef808, _object * arg=0x00000000ef4ad308, _object * kw=0x00000000deda7148)  Line 2068
python34.dll!ext_do_call(_object * func=0x0000000002bdcbf8, _object * * * pp_stack=0x00000000f166f589, int flags=-280305184, int na=0, int nk=0)  Line 4558 + 0xe bytes
python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000ef4ae048, int throwflag=-244910200)  Line 2879
python34.dll!fast_function(_object * func=0x0000000000000001, _object * * * pp_stack=0x00000000ef4a7c50, int n=44825728, int na=1732453353, int nk=0)  Line 4332
python34.dll!call_function(_object * * * pp_stack=0x00000000f166f779, int oparg=131)  Line 4260
python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000db168948, int throwflag=-244909704)  Line 2838
python34.dll!fast_function(_object * func=0x0000000000000001, _object * * * pp_stack=0x00000000ef4a7c50, int n=44826272, int na=1732453353, int nk=0)  Line 4332
python34.dll!call_function(_object * * * pp_stack=0x00000000f166f969, int oparg=131)  Line 4260
python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000ef4ab418, int throwflag=0)  Line 2838
python34.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000001, _object * globals=0x0000000000000001, _object * locals=0x0000000000000000, _object * * args=0x0000000000000000, int argcount=1, _object * * kws=0x0000000000000000, int kwcount=0, _object * * defs=0x0000000000000000, int defcount=0, _object * kwdefs=0x0000000000000000, _object * closure=0x0000000000000000)  Line 3585 + 0xa bytes
python34.dll!function_call(_object * func=0x0000000002abfd08, _object * arg=0x00000000e22ebef0, _object * kw=0x0000000000000000)  Line 638 + 0x45 bytes
python34.dll!PyObject_Call(_object * func=0x00000000e22ebef0, _object * arg=0x0000000000000000, _object * kw=0x0000000001c50048)  Line 2068
python34.dll!method_call(_object * func=0x0000000002d36148, _object * arg=0x0000000001c50048, _object * kw=0x0000000000000000)  Line 348
python34.dll!PyObject_Call(_object * func=0x0000000001c50048, _object * arg=0x0000000000000000, _object * kw=0x0000000002d36148)  Line 2068
python34.dll!PyEval_CallObjectWithKeywords(_object * func=0x00000000c0a69f40, _object * arg=0x00000000673b1100, _object * kw=0x0000000000000000)  Line 4112
python34.dll!t_bootstrap(void * boot_raw=0x00000000dacc7d00)  Line 1000 + 0x17 bytes
python34.dll!bootstrap(void * call=0x00000000dacc7d00)  Line 177
msvcr100.dll!_callthreadstartex()  Line 314 + 0xd bytes
msvcr100.dll!_threadstartex(void * ptd=0x0000000000000000)  Line 292 + 0x5 bytes
kernel32.dll!0000000076eb5a4d() 	
[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]	
ntdll.dll!00000000775aba01()
msg238072 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-03-14 04:42
So what do you make of this?

PyObject_GC_Malloc(unsigned __int64 basicsize=4046951880)

That's nearly 4 GB. I somehow doubt your app is actually trying to allocate
that much memory -- maybe the type object a few lines below in the stack is
overwritten, or maybe the compiler is lying about everything. Is there any
chance you can run an unoptimized build?

On Fri, Mar 13, 2015 at 7:27 PM, Michael Goldish <report@bugs.python.org>
wrote:

>
> Michael Goldish added the comment:
>
> I caught another crash just now, this time in update_refs(). A stack trace
> is attached below. I still think this is the same issue.
>
> static void
> update_refs(PyGC_Head *containers)
> {
>     PyGC_Head *gc = containers->gc.gc_next;
> 0000000067382D60  mov         rdx,qword ptr [rcx]
>     for (; gc != containers; gc = gc->gc.gc_next) {
> 0000000067382D63  cmp         rdx,rcx
> 0000000067382D66  je          update_refs+28h (67382D88h)
> 0000000067382D68  nop         dword ptr [rax+rax]
>         assert(_PyGCHead_REFS(gc) == GC_REACHABLE);
>         _PyGCHead_SET_REFS(gc, Py_REFCNT(FROM_GC(gc)));
> 0000000067382D70  and         qword ptr [rdx+10h],1
>
> Here rdx is 0, which means we're writing to 0x10, probably because the
> member gc_refs is at offset 0x10 of struct PyGC_Head.gc. So I suppose
> containers->gc.gc_next was NULL.
>
> (In case this is helpful: somehow, possibly due to compiler optimizations,
> Visual Studio claims that containers->gc.gc_next is 0x34. I'm not sure what
> to make of this. It also claims that containers->gc.gc_prev->gc.gc_next is
> 0x3e, and that containers->gc.gc_prev->gc.gc_prev->gc.gc_next is 0x3e, and
> so on... gc_prev always seems fine and gc_next is always 0x3e, except for
> the first one which is 0x34. I'm attaching a screenshot to make this
> clearer.)
>
> Stack trace (Python 3.4.2, 64 bit, Windows):
>
> python34.dll!update_refs(_gc_head * containers=0x00000000676af8e0)  Line
> 345
> python34.dll!collect(int generation=-290088656, __int64 *
> n_collected=0x00000000f166e920, __int64 *
> n_uncollectable=0x0000000000000000, int nofail=0)  Line 969
> python34.dll!collect_with_callback(int generation=-290088656)  Line 1141
> python34.dll!_PyObject_GC_Malloc(unsigned __int64 basicsize=4046951880)
> Line 1739
> python34.dll!_PyObject_GC_New(_typeobject * tp=0x0000000001c624f0)  Line
> 1749
> python34.dll!PyList_New(__int64 size=0)  Line 159 + 0xc bytes
> python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000bab65b08, int
> throwflag=-244913096)  Line 2346
> python34.dll!fast_function(_object * func=0x0000000000000003, _object * *
> * pp_stack=0x00000000f77684e0, int n=102445400, int na=1732453353, int
> nk=0)  Line 4332
> python34.dll!call_function(_object * * * pp_stack=0x00000000f166ec29, int
> oparg=131)  Line 4260
> python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000601cbd68, int
> throwflag=-244912600)  Line 2838
> python34.dll!fast_function(_object * func=0x0000000000000003, _object * *
> * pp_stack=0x00000000f7768f28, int n=56017240, int na=1732453353, int
> nk=0)  Line 4332
> python34.dll!call_function(_object * * * pp_stack=0x00000000f166ee19, int
> oparg=131)  Line 4260
> python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000c65ff8f8, int
> throwflag=0)  Line 2838
> python34.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000002, _object *
> globals=0x0000000000000002, _object * locals=0x0000000000000000, _object *
> * args=0x000000000358d248, int argcount=2, _object * *
> kws=0x0000000001c50060, int kwcount=0, _object * * defs=0x0000000000000000,
> int defcount=0, _object * kwdefs=0x0000000000000000, _object *
> closure=0x0000000000000000)  Line 3585 + 0xa bytes
> python34.dll!function_call(_object * func=0x000000000355f048, _object *
> arg=0x00000000f12f7688, _object * kw=0x0000000000000000)  Line 638 + 0x45
> bytes
> python34.dll!PyObject_Call(_object * func=0x00000000f12f7688, _object *
> arg=0x00000000f11c4d08, _object * kw=0x00000000f4e9ba58)  Line 2068
> python34.dll!ext_do_call(_object * func=0x000000000355f048, _object * * *
> pp_stack=0x00000000f166f0d9, int flags=-200649216, int na=1, int nk=0)
> Line 4558 + 0xe bytes
> python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000f40a5458, int
> throwflag=-244911400)  Line 2879
> python34.dll!fast_function(_object * func=0x0000000000000001, _object * *
> * pp_stack=0x00000000f1380f98, int n=45993224, int na=1732453353, int
> nk=0)  Line 4332
> python34.dll!call_function(_object * * * pp_stack=0x00000000f166f2c9, int
> oparg=131)  Line 4260
> python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000d6922548, int
> throwflag=0)  Line 2838
> python34.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000002, _object *
> globals=0x0000000000000002, _object * locals=0x0000000000000000, _object *
> * args=0x0000000002334200, int argcount=2, _object * *
> kws=0x0000000001c50060, int kwcount=0, _object * * defs=0x0000000000000000,
> int defcount=0, _object * kwdefs=0x0000000000000000, _object *
> closure=0x0000000000000000)  Line 3585 + 0xa bytes
> python34.dll!function_call(_object * func=0x0000000002bdcbf8, _object *
> arg=0x00000000e41ef808, _object * kw=0x0000000000000000)  Line 638 + 0x45
> bytes
> python34.dll!PyObject_Call(_object * func=0x00000000e41ef808, _object *
> arg=0x00000000ef4ad308, _object * kw=0x00000000deda7148)  Line 2068
> python34.dll!ext_do_call(_object * func=0x0000000002bdcbf8, _object * * *
> pp_stack=0x00000000f166f589, int flags=-280305184, int na=0, int nk=0)
> Line 4558 + 0xe bytes
> python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000ef4ae048, int
> throwflag=-244910200)  Line 2879
> python34.dll!fast_function(_object * func=0x0000000000000001, _object * *
> * pp_stack=0x00000000ef4a7c50, int n=44825728, int na=1732453353, int
> nk=0)  Line 4332
> python34.dll!call_function(_object * * * pp_stack=0x00000000f166f779, int
> oparg=131)  Line 4260
> python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000db168948, int
> throwflag=-244909704)  Line 2838
> python34.dll!fast_function(_object * func=0x0000000000000001, _object * *
> * pp_stack=0x00000000ef4a7c50, int n=44826272, int na=1732453353, int
> nk=0)  Line 4332
> python34.dll!call_function(_object * * * pp_stack=0x00000000f166f969, int
> oparg=131)  Line 4260
> python34.dll!PyEval_EvalFrameEx(_frame * f=0x00000000ef4ab418, int
> throwflag=0)  Line 2838
> python34.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000001, _object *
> globals=0x0000000000000001, _object * locals=0x0000000000000000, _object *
> * args=0x0000000000000000, int argcount=1, _object * *
> kws=0x0000000000000000, int kwcount=0, _object * * defs=0x0000000000000000,
> int defcount=0, _object * kwdefs=0x0000000000000000, _object *
> closure=0x0000000000000000)  Line 3585 + 0xa bytes
> python34.dll!function_call(_object * func=0x0000000002abfd08, _object *
> arg=0x00000000e22ebef0, _object * kw=0x0000000000000000)  Line 638 + 0x45
> bytes
> python34.dll!PyObject_Call(_object * func=0x00000000e22ebef0, _object *
> arg=0x0000000000000000, _object * kw=0x0000000001c50048)  Line 2068
> python34.dll!method_call(_object * func=0x0000000002d36148, _object *
> arg=0x0000000001c50048, _object * kw=0x0000000000000000)  Line 348
> python34.dll!PyObject_Call(_object * func=0x0000000001c50048, _object *
> arg=0x0000000000000000, _object * kw=0x0000000002d36148)  Line 2068
> python34.dll!PyEval_CallObjectWithKeywords(_object *
> func=0x00000000c0a69f40, _object * arg=0x00000000673b1100, _object *
> kw=0x0000000000000000)  Line 4112
> python34.dll!t_bootstrap(void * boot_raw=0x00000000dacc7d00)  Line 1000 +
> 0x17 bytes
> python34.dll!bootstrap(void * call=0x00000000dacc7d00)  Line 177
> msvcr100.dll!_callthreadstartex()  Line 314 + 0xd bytes
> msvcr100.dll!_threadstartex(void * ptd=0x0000000000000000)  Line 292 + 0x5
> bytes
> kernel32.dll!0000000076eb5a4d()
> [Frames below may be incorrect and/or missing, no symbols loaded for
> kernel32.dll]
> ntdll.dll!00000000775aba01()
>
> ----------
> Added file: http://bugs.python.org/file38478/gc_next.png
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue23187>
> _______________________________________
>
msg238074 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-03-14 08:29
It looks like you are running your app on Windows. Are you using the
proactor event loop? I guess yes since you have more than 1000 clients and
select() is limited to 500 sockets.

In Python 3.4.3, I fixed a *lot* of crashes and race conditions in the
proactor event loop. There are maybe more race conditions. Are you running
subprocesses with asyncio?

Le 14 mars 2015 05:42, "Guido van Rossum" <report@bugs.python.org> a écrit :
> PyObject_GC_Malloc(unsigned __int64 basicsize=4046951880)
>
> That's nearly 4 GB. I somehow doubt your app is actually trying to
allocate
> that much memory

The caller is PyList_New(0).

When using a debugger on an optimized build, you should not trust the
debugger. For example, variables can be stored in registers. Displaying
values in the old frames can just display junk. In gdb, you see
<optimized>. The fast call convention uses registers to pass first
parameters to functions. I'm not sure that debuggers handle this case
correctly. Etc.
msg238082 - (view) Author: Michael Goldish (michaelg) Date: 2015-03-14 12:25
> That's nearly 4 GB. I somehow doubt your app is actually trying to allocate that much memory

There's no reason for the app to allocate that much memory in a single call. It's using almost 4 GB of memory in total today, but that makes sense given its load today.

> Is there any you can run an unoptimized build?

I'll try doing that now. I suppose I should build it myself with Py_DEBUG enabled.
msg238083 - (view) Author: Michael Goldish (michaelg) Date: 2015-03-14 13:02
> It looks like you are running your app on Windows. Are you using the proactor event loop?

Yes.

> In Python 3.4.3, I fixed a *lot* of crashes and race conditions in the proactor event loop. There are maybe more race conditions.

I've already tried Python 3.4.3 and got the same crash. The one described in my long message with the stack trace is from 3.4.2, but I got that same stack trace (or a very similar one) with 3.4.3. I have a core dump from that crash as well so I can provide the exact stack trace if necessary.

> Are you running subprocesses with asyncio?

I can't recall running any subprocesses in the server program. The clients do that very often, but they're not the ones crashing...

> The caller is PyList_New(0).

I suppose this creates an empty list?

I managed to get a stack trace from faulthandler somehow and it points to this line in one of the threads:

d = rg.item_counts_by_params.setdefault(params, {})

This runs in a thread (managed by a ThreadPoolExecutor) that computes stuff and writes it to a file. It's in a loop that runs ~200000 times per job. Several lines above there's a similar line that passes an empty list as the 2nd parameter to setdefault().

The other threads are all in threading.wait(), probably doing nothing interesting. One thread is in _run_once().

Unfortunately I don't have any additional information about this particular crash (no dump or stack trace), but it's not unreasonable to assume other crashes occurred during creation of new lists/dicts in a similar manner.
msg238139 - (view) Author: Michael Goldish (michaelg) Date: 2015-03-15 10:13
OK, I caught the crash with a debug build of Python 3.4.3.

I have a core dump and even the process itself still alive in memory. I can provide any information you need. I can also explain how to debug a core dump with Visual Studio, if necessary.

This time the crash was here:

static void
update_refs(PyGC_Head *containers)
{
    PyGC_Head *gc = containers->gc.gc_next;
    for (; gc != containers; gc = gc->gc.gc_next) {
        assert(_PyGCHead_REFS(gc) == GC_REACHABLE);
        _PyGCHead_SET_REFS(gc, Py_REFCNT(FROM_GC(gc)));
        /* Python's cyclic gc should never see an incoming refcount
         * of 0:  if something decref'ed to 0, it should have been
         * deallocated immediately at that time.
         * Possible cause (if the assert triggers):  a tp_dealloc
         * routine left a gc-aware object tracked during its teardown
         * phase, and did something-- or allowed something to happen --
         * that called back into Python.  gc can trigger then, and may
         * see the still-tracked dying object.  Before this assert
         * was added, such mistakes went on to allow gc to try to
         * delete the object again.  In a debug build, that caused
         * a mysterious segfault, when _Py_ForgetReference tried
         * to remove the object from the doubly-linked list of all
         * objects a second time.  In a release build, an actual
         * double deallocation occurred, which leads to corruption
         * of the allocator's internal bookkeeping pointers.  That's
         * so serious that maybe this should be a release-build
         * check instead of an assert?
         */
------> assert(_PyGCHead_REFS(gc) != 0); <------- crash
    }
}

In the calling frame I can see that update_refs() is called with a PyGC_Head object whose gc_refs is 0, and according to the paragraph above that shouldn't happen. A screenshot is attached.

Stack trace:

msvcr100d.dll!_NMSG_WRITE(int rterrnum=10)  Line 217
msvcr100d.dll!abort()  Line 71
msvcr100d.dll!_wassert(const wchar_t * expr=0x0000000063220618, const wchar_t * filename=0x00000000632205e0, unsigned int lineno=364)  Line 153
python34_d.dll!update_refs(_gc_head * containers=0x000000006338cfc0)  Line 364 + 0x2b bytes
python34_d.dll!collect(int generation=0, __int64 * n_collected=0x000000012beee530, __int64 * n_uncollectable=0x000000012beee520, int nofail=0)  Line 969
python34_d.dll!collect_with_callback(int generation=0)  Line 1140 + 0x16 bytes
python34_d.dll!collect_generations()  Line 1163 + 0x9 bytes
python34_d.dll!_PyObject_GC_Malloc(unsigned __int64 basicsize=56)  Line 1739
python34_d.dll!_PyObject_GC_New(_typeobject * tp=0x00000000633ce280)  Line 1748 + 0xe bytes
python34_d.dll!PyList_New(__int64 size=0)  Line 159 + 0xc bytes
python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000010f387658, int throwflag=3)  Line 2347 + 0x8 bytes
python34_d.dll!fast_function(_object * func=0x0000000000000003, _object * * * pp_stack=0x0000000129e69948, int n=737077536, int na=1661137695, int nk=0)  Line 4335
python34_d.dll!call_function(_object * * * pp_stack=0x000000012beee919, int oparg=131)  Line 4263
python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000011c077528, int throwflag=3)  Line 2840
python34_d.dll!fast_function(_object * func=0x0000000000000003, _object * * * pp_stack=0x0000000089a53190, int n=737078048, int na=1661137695, int nk=0)  Line 4335
python34_d.dll!call_function(_object * * * pp_stack=0x000000012beeeb19, int oparg=131)  Line 4263
python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000010369f2a8, int throwflag=0)  Line 2840
python34_d.dll!PyEval_EvalCodeEx(_object * _co=0x0000000100000002, _object * globals=0x0000000100000002, _object * locals=0x000000001fa5e218, _object * * args=0x0000000000730088, int argcount=2, _object * * kws=0x0000000000730080, int kwcount=0, _object * * defs=0x0000000000000000, int defcount=0, _object * kwdefs=0x0000000000000000, _object * closure=0x0000000000000000)  Line 3588 + 0xf bytes
python34_d.dll!function_call(_object * func=0x0000000003c10058, _object * arg=0x000000001fa5e218, _object * kw=0x00000001296360c8)  Line 638 + 0x8a bytes
python34_d.dll!PyObject_Call(_object * func=0x0000000003c10058, _object * arg=0x000000001fa5e218, _object * kw=0x00000001296360c8)  Line 2040 + 0x13 bytes
python34_d.dll!ext_do_call(_object * func=0x00000000c42c25a0, _object * * * pp_stack=0x000000012beeee49, int flags=62980184, int na=1, int nk=0)  Line 4561 + 0xe bytes
python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x00000000fa4eab18, int throwflag=1)  Line 2880
python34_d.dll!fast_function(_object * func=0x0000000000000001, _object * * * pp_stack=0x000000003b42f398, int n=737079376, int na=1661137695, int nk=0)  Line 4335
python34_d.dll!call_function(_object * * * pp_stack=0x000000012beef049, int oparg=131)  Line 4263
python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000010f38ec28, int throwflag=0)  Line 2840
python34_d.dll!PyEval_EvalCodeEx(_object * _co=0x0000000100000002, _object * globals=0x0000000100000002, _object * locals=0x000000012a1e5d78, _object * * args=0x0000000000730088, int argcount=2, _object * * kws=0x0000000000730080, int kwcount=0, _object * * defs=0x0000000000000000, int defcount=0, _object * kwdefs=0x0000000000000000, _object * closure=0x0000000000000000)  Line 3588 + 0xf bytes
python34_d.dll!function_call(_object * func=0x00000000036d2d48, _object * arg=0x000000012a1e5d78, _object * kw=0x000000012a1f1918)  Line 638 + 0x8a bytes
python34_d.dll!PyObject_Call(_object * func=0x00000000036d2d48, _object * arg=0x000000012a1e5d78, _object * kw=0x000000012a1f1918)  Line 2040 + 0x13 bytes
python34_d.dll!ext_do_call(_object * func=0x00000000033f6040, _object * * * pp_stack=0x000000012beef379, int flags=57486664, int na=0, int nk=0)  Line 4561 + 0xe bytes
python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000010f80f118, int throwflag=1)  Line 2880
python34_d.dll!fast_function(_object * func=0x0000000000000001, _object * * * pp_stack=0x000000012a1ee878, int n=737080704, int na=1661137695, int nk=0)  Line 4335
python34_d.dll!call_function(_object * * * pp_stack=0x000000012beef579, int oparg=131)  Line 4263
python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000010f1c8b98, int throwflag=1)  Line 2840
python34_d.dll!fast_function(_object * func=0x0000000000000001, _object * * * pp_stack=0x000000012a1ee878, int n=737081216, int na=1661137695, int nk=0)  Line 4335
python34_d.dll!call_function(_object * * * pp_stack=0x000000012beef779, int oparg=131)  Line 4263
python34_d.dll!PyEval_EvalFrameEx(_frame * f=0x000000010f80ee88, int throwflag=0)  Line 2840
python34_d.dll!PyEval_EvalCodeEx(_object * _co=0x0000000000000001, _object * globals=0x0000000100000001, _object * locals=0x0000000000730058, _object * * args=0x0000000000000000, int argcount=1, _object * * kws=0x0000000000000000, int kwcount=0, _object * * defs=0x0000000000000000, int defcount=0, _object * kwdefs=0x0000000000000000, _object * closure=0x0000000000000000)  Line 3588 + 0xf bytes
python34_d.dll!function_call(_object * func=0x0000000003404058, _object * arg=0x000000012a1ee058, _object * kw=0x0000000000000000)  Line 638 + 0x8a bytes
python34_d.dll!PyObject_Call(_object * func=0x0000000003404058, _object * arg=0x000000012a1ee058, _object * kw=0x0000000000000000)  Line 2040 + 0x13 bytes
python34_d.dll!method_call(_object * func=0x0000000003404058, _object * arg=0x000000012a1ee058, _object * kw=0x0000000000000000)  Line 347 + 0x1d bytes
python34_d.dll!PyObject_Call(_object * func=0x0000000003826fa8, _object * arg=0x0000000000730058, _object * kw=0x0000000000000000)  Line 2040 + 0x13 bytes
python34_d.dll!PyEval_CallObjectWithKeywords(_object * func=0x0000000000000000, _object * arg=0x0000000000000000, _object * kw=0x0000000000000000)  Line 4115
python34_d.dll!t_bootstrap(void * boot_raw=0x00000000e2e0df70)  Line 1000 + 0x20 bytes
python34_d.dll!bootstrap(void * call=0x000000000070ef20)  Line 176
msvcr100d.dll!_callthreadstartex()  Line 314 + 0x17 bytes
msvcr100d.dll!_threadstartex(void * ptd=0x000000007be2d110)  Line 297
kernel32.dll!0000000076eb5a4d()
[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]
ntdll.dll!00000000775aba01()
msg238147 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-03-15 15:02
(Since you chose to comment this closed issue instead of opening a new one, I reopen this issue.)
msg238150 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-03-15 15:25
>> Did you see some errors or warnings when running your application with asyncio in debug mode?

> No, but I'll try. I doubt the problem is in asyncio itself because it's mostly written in Python. This looks like a garbage collection issue.

Sorry, you're wrong: the proactor event loop heavily uses the _overlapped module which is implemented in C. A crash in the garbage collector is more likely a bug in asyncio/your application, than a bug in Python itself.

Please enable asyncio debug mode, ResourceWarning, enable DEBUG log level, and check if you get some errors. For example, "GetQueuedCompletionStatus() returned an unexpected event" usually means that you will quickly get a memory corruption...

If possible, write all asyncio logs at DEBUG log level and attach them to the issue. Using the logging module, it's possible to add a custom handler for "asyncio". Example:
---
import asyncio
import logging

logging.basicConfig()

logger = asyncio.log.logger
output = open("asyncio.log", "a")
handler = logging.StreamHandler(output)
handler.setLevel(logging.DEBUG)
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)

asyncio.log.logger.debug("plop")
output.close()
---

> python34_d.dll!PyList_New(__int64 size=0)  Line 159 + 0xc bytes

So the crash occurred again during a call to PyList_New(0) which triggerred a garbage collection. The backtrace looks more reliable: _PyObject_GC_Malloc(56).

> msvcr100d.dll!_wassert(const wchar_t * expr=0x0000000063220618, const wchar_t * filename=0x00000000632205e0, unsigned int lineno=364)  Line 153
> python34_d.dll!update_refs(_gc_head * containers=0x000000006338cfc0)  Line 364 + 0x2b bytes

This assertion contains a long comment which explains when the assertion can fail.

        /* Python's cyclic gc should never see an incoming refcount
         * of 0:  if something decref'ed to 0, it should have been
         * deallocated immediately at that time.
         * Possible cause (if the assert triggers):  a tp_dealloc
         * routine left a gc-aware object tracked during its teardown
         * phase, and did something-- or allowed something to happen --
         * that called back into Python.  gc can trigger then, and may
         * see the still-tracked dying object.  Before this assert
         * was added, such mistakes went on to allow gc to try to
         * delete the object again.  In a debug build, that caused
         * a mysterious segfault, when _Py_ForgetReference tried
         * to remove the object from the doubly-linked list of all
         * objects a second time.  In a release build, an actual
         * double deallocation occurred, which leads to corruption
         * of the allocator's internal bookkeeping pointers.  That's
         * so serious that maybe this should be a release-build
         * check instead of an assert?
         */
        assert(_PyGCHead_REFS(gc) != 0);

To come back to asyncio and the proactor event loop: the implementations of the proactor event loop is still young and probably unstable. Would it be possible to test again with fewer clients (500 clients or less to not hit the select() limit on Windows) with selector event loop?

A crash in the garbage collector can be caused by a memory corruption. I fixed a lot of memory corruption caused by internal objects uesd in the proactor event loop.

Can you check if asyncio.windows_events._BaseWaitHandleFuture is used? This class is used by IocpProactor.wait_for_handle() which is called by _WindowsSubprocessTransport. But you wrote you the server (where the crash occurs) doesn't run subprocesses.

I never tried the proactor event loop with threads. Are you using asyncio code outside the main thread? asyncio is not thread-safe at all :-p Are you using a single event loop or one event loop per thread?
msg238151 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-03-15 15:26
Bugfixes of asyncio 3.4.3, a lot of them are related to the proactor event loop (and most of them leaded to crashes):
https://code.google.com/p/tulip/source/browse/ChangeLog#66
msg238155 - (view) Author: Michael Goldish (michaelg) Date: 2015-03-15 17:11
> Sorry, you're wrong: the proactor event loop heavily uses the _overlapped module which is implemented in C. A crash in the garbage collector is more likely a bug in asyncio/your application, than a bug in Python itself.

I'm aware of that. I assumed the original crash reported in this thread was the same as mine but I can't be sure of that. And that crash was on Linux, where asyncio is pure Python AFAIK.

> Please enable asyncio debug mode, ResourceWarning, enable DEBUG log level, and check if you get some errors. For example, "GetQueuedCompletionStatus() returned an unexpected event" usually means that you will quickly get a memory corruption...

I've already done that but admittedly not for very long because the server ran so slowly that it was unresponsive. I didn't see any error or warning messages, but I did see numerous "Executing <Handle _ProactorReadPipeTransport._loop_reading(<_OverlappedF...events.py:451>)" messages.

> Would it be possible to test again with fewer clients (500 clients or less to not hit the select() limit on Windows) with selector event loop?

I'll see if I can try that, but I guess reproducing the crash can take much longer with 500 clients.

> Can you check if asyncio.windows_events._BaseWaitHandleFuture is used? This class is used by IocpProactor.wait_for_handle() which is called by _WindowsSubprocessTransport. But you wrote you the server (where the crash occurs) doesn't run subprocesses.

It's not used. I put print statements and an exit() in the constructor of _BaseWaitHandleFuture and it wasn't triggered.

> Are you using asyncio code outside the main thread?

No.

> asyncio is not thread-safe at all :-p Are you using a single event loop or one event loop per thread?

A single event loop, and I've made sure that all code running outside the main thread doesn't call any asyncio functions.
msg245426 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-06-16 21:14
I'm sorry, but even if it's a real bug, we don't have enough information nor any scenario to reproduce the bug, so it's not possible to investigate the bug. I close the issue.
History
Date User Action Args
2015-06-16 21:14:41hayposetstatus: open -> closed
resolution: out of date
messages: + msg245426
2015-03-15 17:11:49michaelgsetmessages: + msg238155
2015-03-15 15:26:31hayposetmessages: + msg238151
2015-03-15 15:25:52hayposetmessages: + msg238150
2015-03-15 15:02:51hayposetstatus: closed -> open
resolution: not a bug -> (no value)
messages: + msg238147
2015-03-15 10:13:29michaelgsetfiles: + gc_refs.png

messages: + msg238139
2015-03-14 13:02:51michaelgsetmessages: + msg238083
2015-03-14 12:25:36michaelgsetmessages: + msg238082
2015-03-14 08:29:58hayposetmessages: + msg238074
2015-03-14 04:42:42gvanrossumsetmessages: + msg238072
2015-03-14 02:27:46michaelgsetfiles: + gc_next.png

messages: + msg238069
2015-03-14 01:41:39michaelgsetmessages: + msg238066
2015-03-14 00:48:23gvanrossumsetmessages: + msg238065
2015-03-13 22:40:12michaelgsetmessages: + msg238061
2015-03-10 11:05:18hayposetmessages: + msg237745
2015-03-10 10:44:41michaelgsetnosy: + michaelg
messages: + msg237742
2015-01-26 14:19:52hayposetstatus: open -> closed
resolution: not a bug
messages: + msg234747
2015-01-14 01:25:16karamanolevsetmessages: + msg233989
2015-01-14 01:20:55hayposetmessages: + msg233987
2015-01-08 18:10:27r-englundsetnosy: + r-englund
2015-01-08 00:49:13hayposetmessages: + msg233616
2015-01-08 00:29:21karamanolevsetmessages: + msg233615
2015-01-08 00:18:45hayposetmessages: + msg233614
2015-01-08 00:13:08karamanolevsetmessages: + msg233613
2015-01-08 00:10:43hayposetmessages: + msg233612
2015-01-08 00:08:38karamanolevsetmessages: + msg233611
2015-01-08 00:03:58karamanolevsetfiles: + Screenshot 2015-01-07 15.57.01.png

messages: + msg233610
2015-01-07 23:48:14gvanrossumsetmessages: + msg233609
2015-01-07 23:41:35karamanolevcreate