Author josh.r
Recipients eric.snow, josh.r, kaniini, methane, python-dev, rhettinger, serhiy.storchaka, xiang.zhang, zach.ware
Date 2016-10-25.14:31:16
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Explaining expiringdict's issue: It's two race conditions, with itself, not another thread.

Example scenario (narrow race window):

1. An entry has an expiry time of X (so it will self-delete at time X or later)
2. At time X-1, the PySequence_Contains check is run, and it returns 1 (true)
3. Because contains returned True, at time X PyObject_GetItem is run, but because it's time X, expiringdict's __getitem__ deletes the entry and raises KeyError

An alternative scenario with a *huge* race window is:

1. An entry has an expiry time of X (so it will self-delete at time X or later)
2. No lookups or membership tests or any other operations that implicitly clean the expiring dict occur for a while
3. At time X+1000, _odict_FIRST (in popitem) grabs the first entry in the OrderedDict without invoking the __contains__ machinery that would delete the entry
3. At time X+1000 or so, the PySequence_Contains check is run, and it returns 0 (false), because the __contains__ machinery is invoked, and again, because no default is provided for popitem, a KeyError is raised (this time by the popitem machinery, not __getitem__)

expiringdict is unusually good at bringing this on itself. The failing popitem call is in __setitem__ for limited length expiringdicts, self.popitem(last=False), where they're intentionally removing the oldest entry, when the oldest entry is the most likely to have expired (and since __len__ isn't overridden to expire old entries, it may have been expired for quite a while).

The del self[next(OrderedDict.__iter__(self))] works because they didn't override __iter__, so it's not expiring anything to get the first item, and therefore only __delitem__ is involved, not __contains__ or __getitem__ (note: This is also why the bug they reference has an issue with "OrderedDict mutated during iteration"; iteration returns long expired keys, but looking the expired keys up deletes them, causing the mutation issue).

Possible correct fixes:

1. Make popitem _more_ subclass friendly; when it's an OrderedDict subclass, instead of using _odict_LAST and _odict_FIRST, use (C equivalent) of `next(reversed(self))` and `next(iter(self))` respectively. This won't fix expiringdict as is (because it's broken by design; it doesn't override __iter__, so it will happily return long expired keys that disappear on access), but if we're going for subclass friendliness and allowing non-idempotent __contains__/__getitem__/__iter__ implementations, it's the right thing to do. If expiringdict implemented __iter__ to copy the keys, then loop over the copy, deleting expired values and yielding unexpired values, this would at least reduce the huge race window to a narrow window (because it could still yield a key that's almost expired)
2. Check for the presence of __missing__ on the type and only perform the PySequence_Contains check if __missing__ is defined (to avoid autovivification). This fixes the narrow race condition for subclasses without __missing__, but not the huge race condition
3. Explicitly document the idempotence assumptions made by OrderedDict (specifically, that all non-mutating methods of OrderedDict must not be made mutating in subclasses unless the caller also overrides all multistep operations, e.g. pop/popitem/setdefault).

TL;DR: expiringdict is doing terrible things, assuming the superclass will handle them even though the superclass has completely different assumptions, and therefore expiringdict has only itself to blame.
Date User Action Args
2016-10-25 14:31:17josh.rsetrecipients: + josh.r, rhettinger, methane, python-dev, eric.snow, zach.ware, serhiy.storchaka, xiang.zhang, kaniini
2016-10-25 14:31:17josh.rsetmessageid: <>
2016-10-25 14:31:17josh.rlinkissue27275 messages
2016-10-25 14:31:16josh.rcreate