Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WeakValueDictionary next bug (with multithreading) #72613

Closed
arigo mannequin opened this issue Oct 13, 2016 · 20 comments
Closed

WeakValueDictionary next bug (with multithreading) #72613

arigo mannequin opened this issue Oct 13, 2016 · 20 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@arigo
Copy link
Mannequin

arigo mannequin commented Oct 13, 2016

BPO 28427
Nosy @tim-one, @freddrake, @arigo, @pitrou, @larryhastings, @methane, @serhiy-storchaka
Files
  • test.py
  • issue28427.py
  • issue28427.patch: Patch that fixes a part of the bug
  • issue28427-2.patch
  • issue28427-3.patch
  • issue28427-atomic.patch
  • issue28427-3.6.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-12-27.14:11:09.208>
    created_at = <Date 2016-10-13.10:36:29.136>
    labels = ['3.7', 'type-bug', 'library']
    title = 'WeakValueDictionary next bug (with multithreading)'
    updated_at = <Date 2017-08-02.14:08:14.442>
    user = 'https://github.com/arigo'

    bugs.python.org fields:

    activity = <Date 2017-08-02.14:08:14.442>
    actor = 'dubiousjim'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-12-27.14:11:09.208>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2016-10-13.10:36:29.136>
    creator = 'arigo'
    dependencies = []
    files = ['45072', '45082', '45083', '45215', '45709', '45710', '45965']
    hgrepos = []
    issue_num = 28427
    keywords = ['patch']
    message_count = 20.0
    messages = ['278555', '278587', '278637', '279380', '282053', '282054', '282087', '282088', '282089', '282407', '282409', '282428', '282432', '282434', '283625', '284096', '284100', '284101', '298283', '299659']
    nosy_count = 9.0
    nosy_names = ['tim.peters', 'fdrake', 'arigo', 'pitrou', 'larry', 'methane', 'python-dev', 'serhiy.storchaka', 'dubiousjim']
    pr_nums = []
    priority = 'critical'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue28427'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7']

    @arigo
    Copy link
    Mannequin Author

    arigo mannequin commented Oct 13, 2016

    Follow-up on http://bugs.python.org/issue19542.

    Another crash of using WeakValueDictionary() in a thread-local fashion inside a multi-threaded program. I must admit I'm not exactly sure why this occurs, but it is definitely showing an issue: two threads independently create their own WeakValueDictionary() and try to set one item in it. The problem I get is that the "assert 42 in d" sometimes fails, even though 42 was set in that WeakValueDictionary on the previous line and the value is still alive. This only occurs if there is a cycle of references involving the value. See attached file.

    Reproduced on Python 2.7, 3.3, 3.5, 3.6-debug.

    @arigo arigo mannequin added the type-bug An unexpected behavior, bug, or error label Oct 13, 2016
    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Oct 13, 2016
    @serhiy-storchaka
    Copy link
    Member

    Here is simpler reproducer for Python 3. One thread updates WeakValueDictionary in a loop, other threads runs garbage collecting in a loop. Values are collected asynchronously and this can cause removing new value by old key. Following patch fixes this example (or at least makes race condition much less likely). But it doesn't fix the entire issue. If add list(d) after setting a new value, the example fails again.

    @arigo
    Copy link
    Mannequin Author

    arigo mannequin commented Oct 14, 2016

    I'll admit I don't know how to properly fix this issue. What I came up with so far would need an atomic compare_and_delete operation on the dictionary self.data, so that we can do atomically:

    + elif self.data[wr.key] is wr:
    del self.data[wr.key]

    @serhiy-storchaka
    Copy link
    Member

    Following patch fixes more cases. But I don't think it fixes race conditions, it just makes them less likely.

    Increased priority because this bug makes weakref.WeakValueDictionary unusable in multithread program.

    @serhiy-storchaka serhiy-storchaka added the stdlib Python modules in the Lib dir label Oct 25, 2016
    @pitrou
    Copy link
    Member

    pitrou commented Nov 29, 2016

    One possibility would be to always delay removals (always put them in _pending_removals).
    We would then have to enforce removals from time to time, but synchronously.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 29, 2016

    (or we bite the bullet and add a C helper function for the atomic test-and-delete thing)

    @pitrou
    Copy link
    Member

    pitrou commented Nov 30, 2016

    Here is a pure Python patch.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 30, 2016

    Note the issue with this patch is that it may keep keys (with dead values) alive longer than necessary:

    • this may prevent memory consumption from decreasing
    • this may keep alive some system resources

    This is ok when keys are small simple objects (strings or tuples), though.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 30, 2016

    Here is a patch showing the "atomic C function" approach. It will avoid the aforementioned memory growth in the common case, in exchange for a small bit of additional C code.

    @arigo
    Copy link
    Mannequin Author

    arigo mannequin commented Dec 5, 2016

    bpo-28427-atomic.patch: is it still necessary to modify weakref.py so much, then?

    What I had in mind was a C function with Python signature "del_if_equal(dict, key, value)"; the C function doesn't need to know about weakrefs and checking if they are dead. The C function would simply call PyObject_GetItem() and PyObject_DelItem()---without releasing the GIL in the middle.

    @pitrou
    Copy link
    Member

    pitrou commented Dec 5, 2016

    Hi Armin,

    is it still necessary to modify weakref.py so much, then?

    Not sure. I'll take a look again. Modifying __len__() at least is necessary, as the previous version took into account the length of _pending_removals (and could therefore return wrong results). I'm inclined to be a bit defensive here.

    The C function would simply call PyObject_GetItem() and PyObject_DelItem()---without releasing the GIL in the middle.

    If you implement it like that, and the dictionary has non-trivial keys with a user-defined __hash__, then the GIL can be released at the beginning of PyObject_DelItem().

    @arigo
    Copy link
    Mannequin Author

    arigo mannequin commented Dec 5, 2016

    I think the issue of __len__() is a different matter. More below.

    > The C function would simply call PyObject_GetItem() and
    > PyObject_DelItem()---without releasing the GIL in the middle.

    If you implement it like that, and the dictionary has non-trivial
    keys with a user-defined __hash__, then the GIL can be released
    at the beginning of PyObject_DelItem().

    Right, I see your point: your version will, in any case, remove only a dead weakref as a dictionary value. +1

    Now about __len__() returning a wrong result: it is a more complicated issue, I fear. I already patched weakref.py inside PyPy in order to pass a unit test inside CPython's test suite. This patch is to change __len__ to look like this:

        def __len__(self):
            # PyPy change: we can't rely on len(self.data) at all, because
            # the weakref callbacks may be called at an unknown later time.
            result = 0
            for wr in self.data.values():
                result += (wr() is not None)
            return result

    The problem is the delay between the death of a weakref and the actual invocation of the callback, which is systematic on PyPy but can be observed on CPython in some cases too. It means that code like this may fail:

         if list(d) == []:
             assert len(d) == 0

    because list(d) might indeed be empty, but some callbacks have not been called so far and so any simple formula in __len__() will fail to account for that. ('bpo-28427-atomic.patch' does not help for that, but I might have missed a different case where it does help.)

    @pitrou
    Copy link
    Member

    pitrou commented Dec 5, 2016

    Now about __len__() returning a wrong result: it is a more complicated issue, I fear. I already patched weakref.py inside PyPy in order to pass a unit test inside CPython's test suite. This patch is to change __len__ to look like this: [...]

    Thanks for the explanation. Yes, you are right on the principle. But there is also a general expectation that len() on an in-memory container is a O(1) operation, not O(n) - this change would break that expectation quite heavily.

    I don't know how to fix len() without losing O(1) performance. It seems we're in a bit of a quandary on this topic. However, we can still fix the other issues.

    @arigo
    Copy link
    Mannequin Author

    arigo mannequin commented Dec 5, 2016

    Agreed about fixing the other issues. I'm still unclear that we need anything more than just the _remove_dead_weakref function to do that, but also, I don't see a particular problem with adding self._commit_removals() a bit everywhere.

    About the O(1) expectation for len(): it's still unclear if it is better to give a precise answer or if an over-estimate is usually enough. I can see of no reasonable use for a precise answer---e.g. code like this

        while len(d) > 0:
            do_stuff(d.popitem())

    is broken anyway, because a weakref might really die between len(d) and d.popitem(). But on the other hand it makes tests behave strangely. Maybe the correct answer is that such tests are wrong---then I'd be happy to revert the PyPy-specific change to __len__() and fix the test instead. Or maybe weakdicts should always raise in __len__(), and instead have a method .length_upper_bound().

    @pitrou
    Copy link
    Member

    pitrou commented Dec 19, 2016

    The dict implementation in 3.6 has become very complicated, so I'd like someone to review the attached 3.6 patch. Serhiy, Inada?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 27, 2016

    New changeset b8b0718d424f by Antoine Pitrou in branch '3.5':
    Issue bpo-28427: old keys should not remove new values from
    https://hg.python.org/cpython/rev/b8b0718d424f

    New changeset 97d6616b2d22 by Antoine Pitrou in branch '3.6':
    Issue bpo-28427: old keys should not remove new values from
    https://hg.python.org/cpython/rev/97d6616b2d22

    New changeset e5ce7bdf9e99 by Antoine Pitrou in branch 'default':
    Issue bpo-28427: old keys should not remove new values from
    https://hg.python.org/cpython/rev/e5ce7bdf9e99

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 27, 2016

    New changeset 9acdcafd1418 by Antoine Pitrou in branch '2.7':
    Issue bpo-28427: old keys should not remove new values from
    https://hg.python.org/cpython/rev/9acdcafd1418

    @pitrou
    Copy link
    Member

    pitrou commented Dec 27, 2016

    I've pushed the fixes now. It does introduce a small amount of additional code duplication in dictobject.c, but nothing unmanageable.

    Sidenote: all branches now have a different version of dict object each, which makes maintenance really painful...

    @pitrou pitrou closed this as completed Dec 27, 2016
    @larryhastings
    Copy link
    Contributor

    I tested this in a freshly-built 3.4.6. Although it reproduced the behavior you're complaining about--it threw the assert in Armin's test.py, and Serhiy's bpo-28427.py prints an admonishing FAIL--neither test *crashes* CPython. So I'm not convinced either of these is a *security* risk. This is a bug, and 3.4 isn't open for bugfixes, so I don't plan to accept a backport for this in 3.4.

    If this is a crashing bug, please tell me how to reproduce the crashing bug with 3.4.6.

    @dubiousjim
    Copy link
    Mannequin

    dubiousjim mannequin commented Aug 2, 2017

    In response to Issue bpo-7105, self._pending_removals was added to WeakValueDictionaries (and also WeakKeyDictionaries, but they're not relevant to what I'm about to discuss). This was in changesets 58194 to tip and 58195 to 3.1, back in Jan 2010. In those changesets, the implementation of WeakValueDictionary.setdefault acquired a check on self._pending_removals, but only after the key lookup had failed. (See lines starting 5.127 in both those changesets.)

    In changeset 87778, in Dec 2013, this same patch was backported to 2.7.

    More recently, in response to the issue discussed above (Issue bpo-28427), similar checks were added to WeakValueDictionary.get, but now BEFORE the key lookup. This was in changesets 105851 to 3.5, 105852 to 3.6, 105853 to tip, and 105854 to 2.7, in Dec 2016. Notably, in the last changeset, the check on self._pending_removals on WeakValueDictionary.setdefault is also moved to the top of the function, before the key lookup is attempted. This parallels the change being made to WeakValueDictionary.get.

    However, that change to WeakValueDictionary.setdefault was only made to the 2.7 branch. If it's correct, then why wasn't the same also done for 3.5, 3.6, and tip?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants