Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_multiprocessing_spawn: RuntimeError and assertion error on windows xp buildbot #63764

Closed
vstinner opened this issue Nov 12, 2013 · 10 comments

Comments

@vstinner
Copy link
Member

BPO 19565
Nosy @vstinner
Files
  • dealloc-runtimeerror.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-11-18.08:42:03.880>
    created_at = <Date 2013-11-12.22:11:38.452>
    labels = []
    title = 'test_multiprocessing_spawn: RuntimeError and assertion error on windows xp buildbot'
    updated_at = <Date 2013-11-18.08:42:03.879>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2013-11-18.08:42:03.879>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-11-18.08:42:03.880>
    closer = 'vstinner'
    components = []
    creation = <Date 2013-11-12.22:11:38.452>
    creator = 'vstinner'
    dependencies = []
    files = ['32597']
    hgrepos = []
    issue_num = 19565
    keywords = ['patch']
    message_count = 10.0
    messages = ['202725', '202726', '202732', '202734', '202735', '202746', '202758', '202761', '203145', '203258']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'neologix', 'python-dev', 'sbt']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue19565'
    versions = ['Python 3.4']

    @vstinner
    Copy link
    Member Author

    Since the changeset c2a13acd5e2b24560419b93180ee49d1a4839b92 ("Close bpo-19466: Clear the frames of daemon threads earlier during the Python shutdown to call objects destructors"), test_multiprocessing_spawn now show RuntimeError and a child process crashs with an assertion error on windows xp buildbot.

    http://buildbot.python.org/all/builders/x86 XP-4 3.x/builds/9538/steps/test/logs/stdio

    [ 71/385] test_multiprocessing_spawn
    RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
    RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
    RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
    RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
    RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
    RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
    RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
    RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
    RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
    RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
    RuntimeError: I/O operations still in flight while destroying Overlapped object, the process may crash
    Assertion failed: !PyErr_Occurred(), file ..\Python\ceval.c, line 4077

    @vstinner
    Copy link
    Member Author

    I'm able to reproduce the RuntimeError on Windows 7, it comes from a pipe. The message is probably written by a child process, not by the main process. I suppose that Richard knows better than me how to fix this warning, so I don't want to investigate it :-)

    I'm unable to reproduce the "Assertion failed: !PyErr_Occurred(), file ..\Python\ceval.c, line 4077" failure on Windows 7 on my AMD64 with Python compiled in debug mode in 32-bit mode (I only have Visual Studio Express, so no 64-bit binary). I'm interested by this one, but I need a traceback, the C traceback if possible.

    An option would be to enable faulthandler by monkey-patching multiprocessing.spawn.get_command_line() (to add -X faulthandler). But in my exprerience, the Python traceback doesn't help to investigate such assertion error.

    I added this assertion recently in Python 3.4 to detect bugs earlier. If PyEval_CallObjectWithKeywords() is called with an exception set, the exception may be cleared or replaced with a new exception, so the original exception can be lost, which is probably not expected. For example, PyDict_GetItem() raises a KeyError and then clears the current exception.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Nov 13, 2013

    If you have a pending overlapped operation then the associated buffer should not be deallocated until that operation is complete, or else you are liable to get a crash or memory corruption.

    Unfortunately WinXP provides no reliable way to cancel a pending operation -- there is CancelIo() but that just cancels operations started by the *current thread* on a handle. Vista introduced CancelIoEx() which allows cancellation of a specific overlapped op.

    These warnings happen in the deallocator because the buffer has to be freed.

    For Vista and later versions of Windows these warnings are presumably unnecessary since CancelIoEx() is used.

    For WinXP the simplest thing may be to check if Py_Finalize is non-null and if so suppress the warning (possibly "leaking" the buffer since we are exiting anyway).

    @vstinner
    Copy link
    Member Author

    "For Vista and later versions of Windows these warnings are presumably unnecessary since CancelIoEx() is used."

    As close() on regular files, I would prefer to call explicitly cancel() to control exactly when the overlapped operation is cancelled. Can't you fix multiprocessing and/or the unit test to ensure that all overlapped operations are completed or cancelled?

    close() does flush buffers and so may fail. Is it the same for CancelIo/CancelIoEx? In the official documentation, only one error case is described: "If this function cannot find a request to cancel, the return value is 0 (zero), and GetLastError returns ERROR_NOT_FOUND."

    The warning is useful because it may be a real bug in the code. I also like ResourceWarning("unclosed file/socket ...") warnings.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Nov 13, 2013

    As close() on regular files, I would prefer to call explicitly cancel()
    to control exactly when the overlapped operation is cancelled.

    If you use daemon threads then you have no guarantee that the thread will ever get a chance to explicitly call cancel().

    Can't you fix multiprocessing and/or the unit test to ensure that all
    overlapped operations are completed or cancelled?

    On Vista and later, yes, this is done in the deallocator using CancelIoEx(), although there is still a warning. On XP it is not possible because CancelIo() has to be called from the same thread which started the operation.

    I think these warnings come from daemon threads used by "manager" processes. When the manager process exits some background threads may be blocked doing an overlapped read.

    (It might be possible to wake up blocked threads by setting the event handle returned by _PyOS_SigintEvent(). That might allow the use of non-daemon threads.)

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Nov 13, 2013

    I think the attached patch should fix it. Note that with the patch the RuntimeError can probably only occur on Windows XP.

    Shall I apply it?

    @vstinner
    Copy link
    Member Author

    > Can't you fix multiprocessing and/or the unit test to ensure that all
    > overlapped operations are completed or cancelled?

    On Vista and later, yes, this is done in the deallocator using
    CancelIoEx(), although there is still a warning.

    I don't understand. The warning is emitted because an operating is not done nor cancelled. Why not cancel explicitly active operations in manager.shutdown()? It is not possible?

    ... I think these warnings come from daemon threads used by "manager"
    processes. When the manager process exits some background threads
    may be blocked doing an overlapped read.

    I don't know overlapped operations. There are not asynchronous? What do you mean by "blocked doing an overlapped read"?

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Nov 13, 2013

    On 13/11/2013 3:07pm, STINNER Victor wrote:

    > On Vista and later, yes, this is done in the deallocator using
    > CancelIoEx(), although there is still a warning.

    I don't understand. The warning is emitted because an operating is not done nor cancelled. Why not cancel explicitly active operations in manager.shutdown()? It is not possible?

    shutdown() will be run in a different thread to the ones which started
    the overlapped ops, so it cannot stop them using CancelIo(). And
    anyway, it would mean writing a separate implementation for Windows --
    the current manager implementation contains no platform specific code.

    Originally overlapped IO was not used on Windows. But, to get rid of
    polling, Antoine opened the can of worms that is overlapped IO:-)

    > ... I think these warnings come from daemon threads used by "manager"
    > processes. When the manager process exits some background threads
    > may be blocked doing an overlapped read.

    I don't know overlapped operations. There are not asynchronous? What do you mean by "blocked doing an overlapped read"?

    They are asynchronous but the implementation uses a hidden thread pool.
    If a pool thread tries to read from/write to a buffer that has been
    deallocated, then we can get a crash.

    By "blocked doing an overlapped read" I mean that a daemon thread is
    waiting for a line like

         data = conn.recv()

    to complete.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 17, 2013

    New changeset da10196b94f4 by Richard Oudkerk in branch 'default':
    Issue bpo-19565: Prevent warnings at shutdown about pending overlapped ops.
    http://hg.python.org/cpython/rev/da10196b94f4

    @vstinner
    Copy link
    Member Author

    The initial issue (RuntimeError messages) has been fixed, I'm closing the issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant