Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing.Pool.imaps iterators do not maintain alive the multiprocessing.Pool objects #79559

Closed
pablogsal opened this issue Dec 2, 2018 · 22 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@pablogsal
Copy link
Member

BPO 35378
Nosy @pitrou, @vstinner, @ericvw, @applio, @tzickel, @pablogsal
PRs
  • bpo-35378: Link the lifetime of the pool to the pool's iterators and results #10852
  • bpo-35378: Fix multiprocessing.Pool references #11627
  • bpo-35378: Fix multiprocessing.Pool references #11627
  • bpo-35378: Fix multiprocessing.Pool references #11627
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-02-11.17:47:40.589>
    created_at = <Date 2018-12-02.16:43:32.510>
    labels = ['3.8', 'type-bug', 'library']
    title = 'multiprocessing.Pool.imaps iterators do not maintain alive the multiprocessing.Pool objects'
    updated_at = <Date 2019-02-12.19:39:32.780>
    user = 'https://github.com/pablogsal'

    bugs.python.org fields:

    activity = <Date 2019-02-12.19:39:32.780>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-02-11.17:47:40.589>
    closer = 'pablogsal'
    components = ['Library (Lib)']
    creation = <Date 2018-12-02.16:43:32.510>
    creator = 'pablogsal'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 35378
    keywords = ['patch']
    message_count = 22.0
    messages = ['330890', '330893', '330895', '330896', '330899', '331214', '331224', '331225', '331226', '331227', '331228', '331229', '331442', '331444', '331452', '331725', '333012', '333573', '333582', '333594', '335242', '335358']
    nosy_count = 6.0
    nosy_names = ['pitrou', 'vstinner', 'ericvw', 'davin', 'tzickel', 'pablogsal']
    pr_nums = ['10852', '11627', '11627', '11627']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue35378'
    versions = ['Python 3.8']

    @pablogsal
    Copy link
    Member Author

    After applying the PRs in bpo-34172, multiprocessing.Pool.imap hangs on MacOs and Linux. This is a simple reproducer:

    import multiprocessing
    
    def the_test():
        print("Begin")
        for x in multiprocessing.Pool().imap(int,
                ["4", "3"]):
            print(x)
        print("End")
    
    the_test()

    This happens because the IMapIterator does not maintain alive the multiprocessing.Pool object while it is still alive.

    @pablogsal pablogsal added 3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir labels Dec 2, 2018
    @tzickel
    Copy link
    Mannequin

    tzickel mannequin commented Dec 2, 2018

    It's important to note that before those PR, that code would leak the Pool instance until the process ends (once per call).

    master...tzickel:fix34172

    Is my proposed fix (till I get it to a PR).

    @pablogsal
    Copy link
    Member Author

    I was working already on a PR. Do you prefer to wait for yours instead?

    @tzickel
    Copy link
    Mannequin

    tzickel mannequin commented Dec 2, 2018

    I dont mind, I think my code is ready for review, but I'm not versed in this, so if you think you have something better, feel free to open a PR or tell me if I should submit mine, and you can comment on it:

    master...tzickel:fix34172

    @pablogsal
    Copy link
    Member Author

    I think the code is almost ok (I also prefer to also use the cache as an excuse to maintain the pool alive) but the test needs to be done a bit more carefully to avoid hanging the test suite in case of failure and to avoid leaking threads or processes.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 6, 2018

    Let's step back a bit here. This kind of code has never been supported. As Victor says, we should be careful not to add any potential sources of reference cycles.

    The reason the code originally "worked" is that it actually leaked the Pool object (and presumably its worker threads and processes) until the end of the application.

    @pitrou pitrou added type-bug An unexpected behavior, bug, or error and removed 3.7 (EOL) end of life labels Dec 6, 2018
    @pablogsal
    Copy link
    Member Author

    Although I think keeping the iterator is not a bad solution if done correctly, I think more and more that is not the best solution.

    @antoine, would you be ok passing a weak reference to the iterator and raising if the pool is dead?

    I still think we should try to avoid hanging.

    @vstinner
    Copy link
    Member

    vstinner commented Dec 6, 2018

    In general, I'm fine with adding a *weak* reference, especially if it helps to detect bugs.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 6, 2018

    I think a weakref is fine. You don't have to *pass* a weakref: just pass a normal ref and let the Result object take a weakref (and a reference to the cache perhaps).

    @vstinner
    Copy link
    Member

    vstinner commented Dec 6, 2018

    See https://bugs.python.org/issue34172#msg331216 for a more general discussion about how the multiprocessing API is supposed tobe used and multiprocessing objects lifetime.

    @pablogsal
    Copy link
    Member Author

    Ok, I will modify my PR to pass a weak reference and raise.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 6, 2018

    I agree about making the multiprocessing API safer to use, but please let's have this discussion on a dedicated bug entry.

    @pablogsal
    Copy link
    Member Author

    I am playing with weakreferences inside the iterator objects, but this may not be enough. For example, take the code of ApplyResult.get:

    def get(self, timeout=None):
        if self._pool() is None:
            raise RuntimeError("The pool is dead!") <--- new code
        self.wait(timeout)

    It can be that the pool is alive when we check for it (self._pool() is None) but while the code is waiting with no timeout, the pool dies, effectively leaving the program deadlocked with no error.

    @pablogsal
    Copy link
    Member Author

    Also, making tests that do not leak threads for this case is very difficult as if we go with weakrefs, the test *has* to leak a pool (the pool is dead but never calls join/close) to check that when you use the iterator the exception happens. Also, getting such test race-free is going to be challenging.

    @tzickel
    Copy link
    Mannequin

    tzickel mannequin commented Dec 9, 2018

    @vstinner
    Copy link
    Member

    See also bpo-35478: multiprocessing: ApplyResult.get() hangs if the pool is terminated.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 4, 2019

    bpo-35629 has been marked as a duplicate of this issue.

    @pablogsal
    Copy link
    Member Author

    I have been playing with possible solutions for a while and the weak-reference solution seems not robust enough as there are too potential race conditions between the destruction of the weakreferences (the pool) and the handling code.

    I would advocate again for using a strong reference. The reasons are:

    • The rest of the stdlib is using this solution to link iterator objects and similar to their parents (lists, dicts, sets...etc all have strong references back to their parents).

    • This solution is backwards compatible with the current behaviour.

    • We have the new ResourceWarnigns to make clear that this behaviour is not supported.

    @tzickel
    Copy link
    Mannequin

    tzickel mannequin commented Jan 14, 2019

    +1

    @vstinner
    Copy link
    Member

    According to all previous discussions, I now agree with having a strong reference.

    @pablogsal
    Copy link
    Member Author

    New changeset 3766f18 by Pablo Galindo in branch 'master':
    bpo-35378: Fix multiprocessing.Pool references (GH-11627)
    3766f18

    @vstinner
    Copy link
    Member

    Thanks Pablo for the fix ;-)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants