Message 303881 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	Oren Milman, geeknik, lemburg, pitrou, serhiy.storchaka, tim.peters, twouters, vstinner
Date	2017-10-07.15:02:01
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1507388523.0.0.213398074469.issue31165@psf.upfronthosting.co.za>
In-reply-to

Content
See a4dd011259fa6f3079bd0efd95b3a136c0e3c190. The commit message: Tentative fix for a problem that Tim discovered at the last moment, and reported to python-dev: because we were calling dict_resize() in PyDict_Next(), and because GC's dict_traverse() uses PyDict_Next(), and because PyTuple_New() can cause GC, and because dict_items() calls PyTuple_New(), it was possible for dict_items() to have the dict resized right under its nose. The solution is convoluted, and touches several places: keys(), values(), items(), popitem(), PyDict_Next(), and PyDict_SetItem(). There are two parts to it. First, we no longer call dict_resize() in PyDict_Next(), which seems to solve the immediate problem. But then PyDict_SetItem() must have a different policy about when it calls dict_resize(), because we want to guarantee (e.g. for an algorithm that Jeremy uses in the compiler) that you can loop over a dict using PyDict_Next() and make changes to the dict as long as those changes are only value replacements for existing keys using PyDict_SetItem(). This is done by resizing after the insertion instead of before, and by remembering the size before we insert the item, and if the size is still the same, we don't bother to even check if we might need to resize. An additional detail is that if the dict starts out empty, we must still resize it before the insertion. That was the first part. :-) The second part is to make keys(), values(), items(), and popitem() safe against side effects on the dict caused by allocations, under the assumption that if the GC can cause arbitrary Python code to run, it can cause other threads to run, and it's not inconceivable that our dict could be resized -- it would be insane to write code that relies on this, but not all code is sane. Now, I have this nagging feeling that the loops in lookdict probably are blissfully assuming that doing a simple key comparison does not change the dict's size. This is not necessarily true (the keys could be class instances after all). But that's a battle for another day. We have the same issue with lists. PR 3915 tries to fix it by applying the same solution -- calling PyList_New() again if the source container was resized. list_slice() no longer can be considered safe, because it uses the size calculated before calling PyList_New(). Added _PyList_Copy() for copying the list for replacing unsafe PyList_GetSlice(). PyList_SetSlice() is not safe too (the PR still not fixes this). The code that uses the combination of PyList_GetSlice() and PyList_SetSlice() for safety (like in _asynciomodule.c) is not safe. Many code, including most implementations of slicing, should be rewritten if we go this way. PR 3915 shows only small example of such changes. I think than changing the Garbage Collector would be easier.

See a4dd011259fa6f3079bd0efd95b3a136c0e3c190. The commit message:

    Tentative fix for a problem that Tim discovered at the last moment,
    and reported to python-dev: because we were calling dict_resize() in
    PyDict_Next(), and because GC's dict_traverse() uses PyDict_Next(),
    and because PyTuple_New() can cause GC, and because dict_items() calls
    PyTuple_New(), it was possible for dict_items() to have the dict
    resized right under its nose.
    
    The solution is convoluted, and touches several places: keys(),
    values(), items(), popitem(), PyDict_Next(), and PyDict_SetItem().
    
    There are two parts to it. First, we no longer call dict_resize() in
    PyDict_Next(), which seems to solve the immediate problem.  But then
    PyDict_SetItem() must have a different policy about when *it* calls
    dict_resize(), because we want to guarantee (e.g. for an algorithm
    that Jeremy uses in the compiler) that you can loop over a dict using
    PyDict_Next() and make changes to the dict as long as those changes
    are only value replacements for existing keys using PyDict_SetItem().
    This is done by resizing *after* the insertion instead of before, and
    by remembering the size before we insert the item, and if the size is
    still the same, we don't bother to even check if we might need to
    resize.  An additional detail is that if the dict starts out empty, we
    must still resize it before the insertion.
    
    That was the first part. :-)
    
    The second part is to make keys(), values(), items(), and popitem()
    safe against side effects on the dict caused by allocations, under the
    assumption that if the GC can cause arbitrary Python code to run, it
    can cause other threads to run, and it's not inconceivable that our
    dict could be resized -- it would be insane to write code that relies
    on this, but not all code is sane.
    
    Now, I have this nagging feeling that the loops in lookdict probably
    are blissfully assuming that doing a simple key comparison does not
    change the dict's size.  This is not necessarily true (the keys could
    be class instances after all).  But that's a battle for another day.

We have the same issue with lists. PR 3915 tries to fix it by applying the same solution -- calling PyList_New() again if the source container was resized. list_slice() no longer can be considered safe, because it uses the size calculated before calling PyList_New(). Added _PyList_Copy() for copying the list for replacing unsafe PyList_GetSlice(). PyList_SetSlice() is not safe too (the PR still not fixes this). The code that uses the combination of PyList_GetSlice() and PyList_SetSlice() for safety (like in _asynciomodule.c) is not safe. Many code, including most implementations of slicing, should be rewritten if we go this way. PR 3915 shows only small example of such changes.

I think than changing the Garbage Collector would be easier.

History
Date	User	Action	Args
2017-10-07 15:02:03	serhiy.storchaka	set	recipients: + serhiy.storchaka, lemburg, tim.peters, twouters, pitrou, vstinner, Oren Milman, geeknik
2017-10-07 15:02:03	serhiy.storchaka	set	messageid: <1507388523.0.0.213398074469.issue31165@psf.upfronthosting.co.za>
2017-10-07 15:02:02	serhiy.storchaka	link	issue31165 messages
2017-10-07 15:02:01	serhiy.storchaka	create