classification
Title: test_multiprocessing_spawn: RuntimeError and assertion error on windows xp buildbot
Type: Stage:
Components: Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: neologix, python-dev, sbt, vstinner
Priority: normal Keywords: patch

Created on 2013-11-12 22:11 by vstinner, last changed 2013-11-18 08:42 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
dealloc-runtimeerror.patch sbt, 2013-11-13 13:42 review
Messages (10)
msg202725 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-12 22:11
Since the changeset c2a13acd5e2b24560419b93180ee49d1a4839b92 ("Close #19466: Clear the frames of daemon threads earlier during the Python shutdown to call objects destructors"), test_multiprocessing_spawn now show RuntimeError and a child process crashs with an assertion error on windows xp buildbot.

http://buildbot.python.org/all/builders/x86 XP-4 3.x/builds/9538/steps/test/logs/stdio

[ 71/385] test_multiprocessing_spawn
RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
Assertion failed: !PyErr_Occurred(), file ..\Python\ceval.c, line 4077
msg202726 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-13 00:13
I'm able to reproduce the RuntimeError on Windows 7, it comes from a pipe. The message is probably written by a child process, not by the main process. I suppose that Richard knows better than me how to fix this warning, so I don't want to investigate it :-)

I'm unable to reproduce the "Assertion failed: !PyErr_Occurred(), file ..\Python\ceval.c, line 4077" failure on Windows 7 on my AMD64 with Python compiled in debug mode in 32-bit mode (I only have Visual Studio Express, so no 64-bit binary). I'm interested by this one, but I need a traceback, the C traceback if possible.

An option would be to enable faulthandler by monkey-patching multiprocessing.spawn.get_command_line() (to add -X faulthandler). But in my exprerience, the Python traceback doesn't help to investigate such assertion error.

I added this assertion recently in Python 3.4 to detect bugs earlier. If PyEval_CallObjectWithKeywords() is called with an exception set, the exception may be cleared or replaced with a new exception, so the original exception can be lost, which is probably not expected. For example, PyDict_GetItem() raises a KeyError and then clears the current exception.
msg202732 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-11-13 10:08
If you have a pending overlapped operation then the associated buffer should not be deallocated until that operation is complete, or else you are liable to get a crash or memory corruption.

Unfortunately WinXP provides no reliable way to cancel a pending operation -- there is CancelIo() but that just cancels operations started by the *current thread* on a handle.  Vista introduced CancelIoEx() which allows cancellation of a specific overlapped op.

These warnings happen in the deallocator because the buffer has to be freed.

For Vista and later versions of Windows these warnings are presumably unnecessary since CancelIoEx() is used.

For WinXP the simplest thing may be to check if Py_Finalize is non-null and if so suppress the warning (possibly "leaking" the buffer since we are exiting anyway).
msg202734 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-13 10:19
"For Vista and later versions of Windows these warnings are presumably unnecessary since CancelIoEx() is used."

As close() on regular files, I would prefer to call explicitly cancel() to control exactly when the overlapped operation is cancelled. Can't you fix multiprocessing and/or the unit test to ensure that all overlapped operations are completed or cancelled?

close() does flush buffers and so may fail. Is it the same for CancelIo/CancelIoEx? In the official documentation, only one error case is described: "If this function cannot find a request to cancel, the return value is 0 (zero), and GetLastError returns ERROR_NOT_FOUND."

The warning is useful because it may be a real bug in the code. I also like ResourceWarning("unclosed file/socket ...") warnings.
msg202735 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-11-13 10:55
> As close() on regular files, I would prefer to call explicitly cancel() 
> to control exactly when the overlapped operation is cancelled.

If you use daemon threads then you have no guarantee that the thread will ever get a chance to explicitly call cancel().

> Can't you fix multiprocessing and/or the unit test to ensure that all 
> overlapped operations are completed or cancelled?

On Vista and later, yes, this is done in the deallocator using CancelIoEx(), although there is still a warning.  On XP it is not possible because CancelIo() has to be called from the same thread which started the operation.

I think these warnings come from daemon threads used by "manager" processes.  When the manager process exits some background threads may be blocked doing an overlapped read.

(It might be possible to wake up blocked threads by setting the event handle returned by _PyOS_SigintEvent().  That might allow the use of non-daemon threads.)
msg202746 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-11-13 13:42
I think the attached patch should fix it.  Note that with the patch the RuntimeError can probably only occur on Windows XP.

Shall I apply it?
msg202758 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-13 15:07
>> Can't you fix multiprocessing and/or the unit test to ensure that all 
>> overlapped operations are completed or cancelled?

> On Vista and later, yes, this is done in the deallocator using
> CancelIoEx(), although there is still a warning.

I don't understand. The warning is emitted because an operating is not done nor cancelled. Why not cancel explicitly active operations in manager.shutdown()? It is not possible?

> ... I think these warnings come from daemon threads used by "manager"
> processes.  When the manager process exits some background threads
> may be blocked doing an overlapped read.

I don't know overlapped operations. There are not asynchronous? What do you mean by "blocked doing an overlapped read"?
msg202761 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-11-13 16:51
On 13/11/2013 3:07pm, STINNER Victor wrote:
>> On Vista and later, yes, this is done in the deallocator using
>> CancelIoEx(), although there is still a warning.
>
> I don't understand. The warning is emitted because an operating is not done nor cancelled. Why not cancel explicitly active operations in manager.shutdown()? It is not possible?

shutdown() will be run in a different thread to the ones which started 
the overlapped ops, so it cannot stop them using CancelIo().  And 
anyway, it would mean writing a separate implementation for Windows -- 
the current manager implementation contains no platform specific code.

Originally overlapped IO was not used on Windows.  But, to get rid of 
polling, Antoine opened the can of worms that is overlapped IO:-)

>> ... I think these warnings come from daemon threads used by "manager"
>> processes.  When the manager process exits some background threads
>> may be blocked doing an overlapped read.
>
> I don't know overlapped operations. There are not asynchronous? What do you mean by "blocked doing an overlapped read"?

They are asynchronous but the implementation uses a hidden thread pool. 
  If a pool thread tries to read from/write to a buffer that has been 
deallocated, then we can get a crash.

By "blocked doing an overlapped read" I mean that a daemon thread is 
waiting for a line like

     data = conn.recv()

to complete.
msg203145 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-11-17 13:24
New changeset da10196b94f4 by Richard Oudkerk in branch 'default':
Issue #19565: Prevent warnings at shutdown about pending overlapped ops.
http://hg.python.org/cpython/rev/da10196b94f4
msg203258 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-18 08:42
The initial issue (RuntimeError messages) has been fixed, I'm closing the issue.
History
Date User Action Args
2013-11-18 08:42:03vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg203258
2013-11-17 13:24:30python-devsetnosy: + python-dev
messages: + msg203145
2013-11-13 16:51:40sbtsetmessages: + msg202761
2013-11-13 15:07:44vstinnersetmessages: + msg202758
2013-11-13 13:42:45sbtsetfiles: + dealloc-runtimeerror.patch
keywords: + patch
messages: + msg202746
2013-11-13 10:55:46sbtsetmessages: + msg202735
2013-11-13 10:20:00vstinnersetmessages: + msg202734
2013-11-13 10:08:28sbtsetmessages: + msg202732
2013-11-13 00:20:45gvanrossumsetnosy: - gvanrossum
2013-11-13 00:13:03vstinnersetmessages: + msg202726
2013-11-12 22:11:38vstinnercreate