Author josh.r
Recipients josh.r, pablogsal, vstinner, xtreak
Date 2019-02-12.18:59:49
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1549997989.75.0.36622959368.issue35961@roundup.psfhosted.org>
In-reply-to
Content
Victor found the same bug I found while I was composing this, posting only to incorporate proposed solution:

I *think* I have a cause for this, but someone else with greater understanding of the cycle collector should check me if the suggested fix has non-trivial performance implications (I suspect the answer is no, performance is unaffected).

slice_richcompare borrows its behavior from tuple by creating a temporary tuple for each slice, the delegating to the tuple comparison ( https://github.com/python/cpython/blob/master/Objects/sliceobject.c#L591 ).

Problem is, it uses borrowed references when creating said tuples, not owned references. Because test_slice's BadCmp.__eq__ is implemented in Python, the comparison can be interrupted by cycle collection during the __eq__ call. When then happens, there are precisely two references to the BadCmp object:

1. In the slice (owned)
2. In the temporary tuple (borrowed)

When a cycle collection occurs during the comparison, and subtract_refs ( https://github.com/python/cpython/blob/master/Modules/gcmodule.c#L398 ) is called, the BadCmp object in question is visited via both the slice and the tuple, and since it has no non-container objects referencing it, it ends up with the initial reference count of 1 attempting to drop to -1, and the assertion is violated. While the code of gcmodule.c appears to have been refactored since 3.7 so the assert occurs in a different function, with a slightly different message, it would break the same way in both 3.7 and master, and whether or not it triggers the bug, the broken behavior of slice_richcompare hasn't changed for a *long* time. 

Underlying problem would seem to be slice's richcompare believing it's okay to make a tuple from borrowed references, then make a call on it that can trigger calls into Python level code (and therefore into the cycle collector); everything else is behaving correctly here. I'm guessing the only reason it's not seen in the wild is that slices based on Python defined types are almost never compared at all, let alone compared on debug builds that would be checking the assert and with an accelerated cycle collection cycle that would make a hit likely.

Solution would be to stop trying to microoptimize slice_richcompare to avoid reference count manipulation and just build a proper tuple. It would even simplify the code since we could just use PyTuple_Pack, reducing custom code by replacing:

    t1 = PyTuple_New(3);
    if (t1 == NULL)
        return NULL;
    t2 = PyTuple_New(3);
    if (t2 == NULL) {
        Py_DECREF(t1);
        return NULL;
    }

    PyTuple_SET_ITEM(t1, 0, ((PySliceObject *)v)->start);
    PyTuple_SET_ITEM(t1, 1, ((PySliceObject *)v)->stop);
    PyTuple_SET_ITEM(t1, 2, ((PySliceObject *)v)->step);
    PyTuple_SET_ITEM(t2, 0, ((PySliceObject *)w)->start);
    PyTuple_SET_ITEM(t2, 1, ((PySliceObject *)w)->stop);
    PyTuple_SET_ITEM(t2, 2, ((PySliceObject *)w)->step);

with:

    t1 = PyTuple_Pack(3, ((PySliceObject *)v)->start, ((PySliceObject *)v)->stop, ((PySliceObject *)v)->step);
    if (t1 == NULL)
        return NULL;
    t2 = PyTuple_Pack(3, ((PySliceObject *)w)->start, ((PySliceObject *)w)->stop, ((PySliceObject *)w)->step);
    if (t2 == NULL) {
        Py_DECREF(t1);
        return NULL;
    }

and makes cleanup simpler, since you can just delete:

    PyTuple_SET_ITEM(t1, 0, NULL);
    PyTuple_SET_ITEM(t1, 1, NULL);
    PyTuple_SET_ITEM(t1, 2, NULL);
    PyTuple_SET_ITEM(t2, 0, NULL);
    PyTuple_SET_ITEM(t2, 1, NULL);
    PyTuple_SET_ITEM(t2, 2, NULL);

and let the DECREFs for t1/t2 do their work normally.

If for some reason the reference count manipulation is unacceptable, this *could* switch between two behaviors depending on whether or not start/stop/step are of known types (e.g. if all are NoneType/int, this could use the borrowed refs code path safely) where a call back into Python level code is impossible; given that slices are usually made of None and/or ints, this would remove most of the cost for the common case, at the expense of more complicated code. Wouldn't help numpy types though, and I suspect the cost of pre-checking the types for all six values involved would eliminate most of the savings.

Sorry for not submitting a proper PR; the work machine I use during the day is not suitable for development (doesn't even have Python installed).
History
Date User Action Args
2019-02-12 18:59:49josh.rsetrecipients: + josh.r, vstinner, pablogsal, xtreak
2019-02-12 18:59:49josh.rsetmessageid: <1549997989.75.0.36622959368.issue35961@roundup.psfhosted.org>
2019-02-12 18:59:49josh.rlinkissue35961 messages
2019-02-12 18:59:49josh.rcreate