Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocesing.Queue silently ignore messages after exc in _feeder #74599

Closed
grzgrzgrz3 mannequin opened this issue May 20, 2017 · 12 comments
Closed

multiprocesing.Queue silently ignore messages after exc in _feeder #74599

grzgrzgrz3 mannequin opened this issue May 20, 2017 · 12 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@grzgrzgrz3
Copy link
Mannequin

grzgrzgrz3 mannequin commented May 20, 2017

BPO 30414
Nosy @pitrou, @vstinner, @applio, @zhangyangyu, @tomMoral, @grzgrzgrz3
PRs
  • bpo-30414: multiprocessing.Queue._feed do not break from main loop on exc #1683
  • [3.6] bpo-30414: multiprocessing.Queue._feed do not break from main loop on exc (GH-1683) #1815
  • [3.5] bpo-30414: multiprocessing.Queue._feed do not break from main loop on exc (GH-1683) #1816
  • [2.7] bpo-30414: multiprocessing.Queue._feed do not break from main loop on exc (GH-1683) #1817
  • bpo-30006 More robust concurrent.futures.ProcessPoolExecutor #1013
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-05-25.15:54:28.329>
    created_at = <Date 2017-05-20.18:26:50.921>
    labels = ['3.7', 'type-bug', 'library']
    title = 'multiprocesing.Queue silently ignore messages after exc in _feeder'
    updated_at = <Date 2017-06-09.12:29:15.767>
    user = 'https://github.com/grzgrzgrz3'

    bugs.python.org fields:

    activity = <Date 2017-06-09.12:29:15.767>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-05-25.15:54:28.329>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2017-05-20.18:26:50.921>
    creator = 'grzgrzgrz3'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 30414
    keywords = []
    message_count = 12.0
    messages = ['294044', '294399', '294401', '294481', '294485', '294486', '294492', '294493', '295022', '295040', '295116', '295521']
    nosy_count = 6.0
    nosy_names = ['pitrou', 'vstinner', 'davin', 'xiang.zhang', 'tomMoral', 'grzgrzgrz3']
    pr_nums = ['1683', '1815', '1816', '1817', '1013']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue30414'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7']

    @grzgrzgrz3
    Copy link
    Mannequin Author

    grzgrzgrz3 mannequin commented May 20, 2017

    multiprocessing.Queue is running background thread feeder. Feeder serialize and sends buffered data to pipe.

    The issue is with exception handling, feeder is catching all exceptions but out of main loop, so after exception is handled feeder is not going back to loop - thread finish. If feeder thread is not running any Queue.put will execute without exceptions but message not gonna be delivered.

    Solution is to move exception handling inside main loop. I will provide PR.

    I have run performance tests (found: bpo-17025) and submitted patch do not affect performance.

    @grzgrzgrz3 grzgrzgrz3 mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir labels May 20, 2017
    @pitrou
    Copy link
    Member

    pitrou commented May 24, 2017

    Can you expand on which exceptions you are getting in the feeder thread?

    @pitrou pitrou added the type-bug An unexpected behavior, bug, or error label May 24, 2017
    @pitrou
    Copy link
    Member

    pitrou commented May 24, 2017

    Nevermind, I saw the PR and the test case.

    @pitrou
    Copy link
    Member

    pitrou commented May 25, 2017

    New changeset bc50f03 by Antoine Pitrou (grzgrzgrz3) in branch 'master':
    bpo-30414: multiprocessing.Queue._feed do not break from main loop on exc (bpo-1683)
    bc50f03

    @pitrou
    Copy link
    Member

    pitrou commented May 25, 2017

    New changeset 2783cc4 by Antoine Pitrou in branch '3.6':
    [3.6] bpo-30414: multiprocessing.Queue._feed do not break from main loop on exc (GH-1683) (bpo-1815)
    2783cc4

    @pitrou
    Copy link
    Member

    pitrou commented May 25, 2017

    New changeset 89004d7 by Antoine Pitrou in branch '3.5':
    [3.5] bpo-30414: multiprocessing.Queue._feed do not break from main loop on exc (GH-1683) (bpo-1816)
    89004d7

    @pitrou
    Copy link
    Member

    pitrou commented May 25, 2017

    New changeset bdd9647 by Antoine Pitrou in branch '2.7':
    [2.7] bpo-30414: multiprocessing.Queue._feed do not break from main loop on exc (GH-1683) (bpo-1817)
    bdd9647

    @pitrou
    Copy link
    Member

    pitrou commented May 25, 2017

    This is committed and pushed, thank you!

    @pitrou pitrou closed this as completed May 25, 2017
    @tomMoral
    Copy link
    Mannequin

    tomMoral mannequin commented Jun 2, 2017

    This fix, while preventing the Queue to crash, does not give any way to programatically detect that the message was dropped. This is a problem as we can no longer assume that the Queue will not drop messages. For instance, we can no longer detect deadlocks in concurrent.futures.ProcessPoolExecutor as done in #1013 where the crashed QueueFeederThread was used to monitor the working state of the executor.

    We could either:

    • Put a flag highlighting the fact that some messages where dropped.
    • Add an argument to the Queue to close on pickling errors.

    I'd be happy to work on a PR to implement any solution that you think is reasonable.

    @pitrou
    Copy link
    Member

    pitrou commented Jun 2, 2017

    Thomas, thanks for the heads up. I would suggest something like the following patch to multiprocessing.Pool:

    $ git diff
    diff --git a/Lib/multiprocessing/queues.py b/Lib/multiprocessing/queues.py
    index 7f77837..ebbb360 100644
    --- a/Lib/multiprocessing/queues.py
    +++ b/Lib/multiprocessing/queues.py
    @@ -260,8 +260,16 @@ class Queue(object):
                         info('error in queue thread: %s', e)
                         return
                     else:
    -                    import traceback
    -                    traceback.print_exc()
    +                    self._on_queue_thread_error(e)
    +
    +    def _on_queue_thread_error(self, e):
    +        """
    +        Private API called when feeding data in the background thread
    +        raises an exception.  For overriding by concurrent.futures.
    +        """
    +        import traceback
    +        traceback.print_exc()
    +
     
     _sentinel = object()
     

    Then you can write your own Queue subclass in concurrent.futures to handle that error and clean up/restart whatever needs to be cleaned up or restarted. What do you think?

    @tomMoral
    Copy link
    Mannequin

    tomMoral mannequin commented Jun 4, 2017

    I think this is a good solution as it let the user define easily the behavior it needs in other situation too. I would recommend adding the object responsible for the failure to the _on_queue_thread_error callback. This would simplify the error handling.

    @@ -260,8 +260,16 @@ class Queue(object):
                         info('error in queue thread: %s', e)
                         return
                     else:
    -                    import traceback
    -                    traceback.print_exc()
    +                    self._on_queue_thread_error(e, obj)
    +
    +    def _on_queue_thread_error(self, e, obj):
    +        """
    +        Private API called when feeding data in the background thread
    +        raises an exception.  For overriding by concurrent.futures.
    +        """
    +        import traceback
    +        traceback.print_exc()
    +

    @vstinner
    Copy link
    Member

    vstinner commented Jun 9, 2017

    We started to get *random* failures of test_queue_feeder_donot_stop_onexc() of test_multiprocessing_spawn since a few days. I may be related to this change. Can you please take a look at bpo-30595?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants