classification
Title: Insufficient description of cyclic garbage collector for C API
Type: Stage:
Components: C API, Documentation Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, iabervon, nanjekyejoannah, pablogsal, tim.peters
Priority: normal Keywords:

Created on 2020-06-26 22:29 by iabervon, last changed 2020-07-01 23:44 by pablogsal.

Messages (4)
msg372445 - (view) Author: Daniel Barkalow (iabervon) Date: 2020-06-26 22:29
Nothing in the documentation gives you enough information to find what you're doing wrong if you have the following bug for an object tracked by the GC: (a) you store a borrowed reference to an object whose lifetime is always longer that the buggy object (and never decrement it); (b) you visit that object in your tp_traverse.

Despite the bug, everything works nearly all of the time. However, when it fails, the effect makes no sense from a documentation-only understanding of the garbage collection (unless Python is built with --with-assertions, perhaps). The only sequence in which it fails is:

* All of the objects that will reference the victim get moved into an older generation.
* The victim gets exactly as many buggy references from objects in its own generation as there are good references, and no good references from objects in its own generation.
* Garbage collection runs for only the younger generation.

At this point, the victim is marked as garbage despite the fact that there was a good object referencing it at all times.

Reading the Python source, it becomes clear that the garbage collector handles references from older generations not by traversing the objects in older generations, but by comparing the count of references from other young objects to the usual reference count. That is, visiting an object you don't hold a reference to may cause that object to get collected, rather than protecting it, even if there are other reasons not to collect it.

The best fix would probably be a warning in the documentation for tp_traverse that visiting an object you do not hold a strong reference to can cause inexplicable effects, because the intuitive guess would be that it could only cause memory leaks.

It would probably also be worth mentioning in the documentation of the garbage collector something about its algorithm, so people have a hint that references from older objects aren't necessarily sufficient to overcome buggy use of the C API.
msg372645 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-06-30 02:41
I don't see real value in the docs noting that Bad Things can happen if code lies about true refcounts. If a container points to an object, _of course_ the container should own that reference. Cheating on that isn't intended to be supported in any way, so there's no obligation to explain how or why things can go wrong otherwise. Worse, trying to explain such things would constrain implementations to have the same specific kinds of failure modes forever after. Neither would there would be real value in repeating "DON'T LIE ABOUT REFCOUNTS!" on every page, either ;-)

If someone just wants to know more about CPython's cyclic collector, that's fine, but that takes a great many more words than are suitable in the C API docs. Luckily, Pablo recently did that:

https://devguide.python.org/garbage_collector/
msg372650 - (view) Author: Daniel Barkalow (iabervon) Date: 2020-06-30 04:26
I think it would be helpful to have something as troubleshooting information on the garbage collector. If you've got a bug in your C module, it's obvious that you shouldn't be doing something egregiously wrong, but it's not obvious what you might have done wrong in order to cause that particular failure.

It wouldn't be useful to promise that, if you have a particular sort of bug, you'll get a particular sort of failure, or keep it the same from version to version, but it would be nice to be told, if you're getting tp_clear called on objects that aren't garbage in this particular version of Python, look for cases where you're calling py_VISIT on it more times than you called Py_INCREF. Alternatively, it would be helpful for the API documentation to say, "If you are having some strange problem with garbage collection, here's documentation (Developer Guide link) on what's really going on in the garbage collector, which might help explain what you could be doing wrong to get that effect."

For that matter, the documentation for tp_traverse doesn't mention that you have to call Py_VISIT on each object exactly as many times as references you hold to it; the phrasing sounds like it's fine so long as you call it at least once on any object you hold a reference to.
msg372798 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-07-01 23:44
I agree with what Tim said. We also cannot foresee every possible bug of someone misusing the C API, and the garbage collector interacts with many systems of the VM, and having gigantic documentation has shown to be always detrimental. 

> I think it would be helpful to have something as troubleshooting information on the garbage collector. If you've got a bug in your C module,

"A bug in your C module" can be absolutely anything (even more things given that we are talking about C!). Many people also blame the GC immediately just because the crash happens during visitation when the bug already happened much before. I don't think that a "general debugging section" about the GC will be very as helpful as improving the clarity of the contract.

> For that matter, the documentation for tp_traverse doesn't mention that you have to call Py_VISIT on each object exactly as many times as references you hold to it; the phrasing sounds like it's fine so long as you call it at least once on any object you hold a reference to.

A note in the docs about this sounds reasonable as is well-scoped. If you want, you can do a PR and add me there and I will review it. Otherwise, I will try to put something together myself.
History
Date User Action Args
2020-07-01 23:44:04pablogsalsetmessages: + msg372798
2020-07-01 19:06:23nanjekyejoannahsetnosy: + pablogsal, nanjekyejoannah
2020-06-30 04:26:35iabervonsetmessages: + msg372650
2020-06-30 02:41:30tim.peterssetnosy: + tim.peters
messages: + msg372645
2020-06-26 22:29:51iabervoncreate