classification
Title: Run gc_collect() before complaining about dangling threads
Type: resource usage Stage:
Components: Tests Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, ezio.melotti, njs, pitrou, smurfix, vstinner
Priority: normal Keywords: patch

Created on 2018-02-15 14:02 by smurfix, last changed 2018-03-27 22:34 by njs.

Files
File name Uploaded Description Edit
gc.patch smurfix, 2018-02-15 14:02 Patch
gc.patch smurfix, 2018-02-15 14:12
Messages (9)
msg312206 - (view) Author: Matthias Urlichs (smurfix) * Date: 2018-02-15 14:02
Lib/test/support/__init__.py::threading_cleanup() complains about dangling threads even if the reference in question would be cleaned up by the garbage collector.

This is not useful, esp. when the list of referrers to the "dangling" thread looks like this:

[<frame at 0x7fe830195768, file '/usr/lib/python3.7/threading.py', line 869, code run>, <frame at 0x7fe828000b38, file '/usr/lib/python3.7/threading.py', line 966, code _bootstrap_inner>, <frame at 0x7fe83018aac8, file '/usr/lib/python3.7/threading.py', line 889, code _bootstrap>]

Thus I propose to check, run gc, check again, and only *then* complain-and-wait. Hence the attached patch for your consideration.
msg312207 - (view) Author: Matthias Urlichs (smurfix) * Date: 2018-02-15 14:12
Upon further consideration (and following the observation that my test cases no longer block for two seconds each after applying the first version of this patch): we do not want to clear the reference to "dangling_threads" since that's a weakset. We want to clear the "thread" variable, which holds a reference to a random member of that set, which will arbitrarily block the cleanup loop from ever succeeding.

Updated patch attached.
msg314547 - (view) Author: Matthias Urlichs (smurfix) * Date: 2018-03-27 20:30
Apparently this patch has not been applied yet. Is there a reason for that, besides "it's obviously correct so there must be something wrong with it"? ;-)
msg314548 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2018-03-27 20:38
We generally prefer pull requests on github over patches on the bug tracker. Please follow the guidelines for contributing to CPython, https://devguide.python.org/pullrequest/
msg314552 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2018-03-27 21:56
Yes, a PR would be better... also, the two versions of the patch appear to be identical?

CC'ing Antoine and Victor b/c they seem to have written this dangling_threads checking stuff and I'm not sure I understand it :-).

As some extra background: this is in Python's internal testing code, but @smurfix is hitting it because he's working on a third-party asyncio event loop implementation, and re-using test.asyncio to test it. I'm not sure what specifically is causing this to happen, though I guess it has something to do with how trio / trio-asyncio are using threads.

[1] https://github.com/python-trio/trio-asyncio
msg314553 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-03-27 22:01
> Lib/test/support/__init__.py::threading_cleanup() complains about dangling threads even if the reference in question would be cleaned up by the garbage collector.

It's a deliberate choice. It helped me to find real bugs. For example, I found a very old reference cycle in socket.create_connection().

The error message ("dangling threads") doesn't help, but it means that "something is wrong" in your test.

Sadly, sometimes you have to explicitly do something like "self.thread = None" in your test.

I wrote an article to explain these things:
https://vstinner.github.io/contrib-cpython-2017q3-part2.html

Instead of calling support.gc_collect(), I suggest to write documentation, maybe as a comment in support.threading_cleanup()?, explaining the warning and how to fix it. Maybe with a link to this issue? :-)
msg314554 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-03-27 22:04
> This is not useful, esp. when the list of referrers to the "dangling" thread looks like this: (...)

I wrote the test.bisect tool to help to debug dangling threads issues.

Sadly, sometimes the warning can occur randomly because of race conditions, and it can be painful to reproduce the bug, and to fix the root issues.
msg314555 - (view) Author: Matthias Urlichs (smurfix) * Date: 2018-03-27 22:21
> It's a deliberate choice. It helped me to find real bugs. For example, I found a very old reference cycle in socket.create_connection().

Fair enough; I will change the patch to complain before gc'ing.
msg314558 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2018-03-27 22:34
@Victor: So to make sure I understand, the point of the check is to complain about reference cycles involving thread objects, because when those happen in the stdlib you consider them bugs?
History
Date User Action Args
2018-03-27 22:34:22njssetmessages: + msg314558
2018-03-27 22:21:27smurfixsetmessages: + msg314555
2018-03-27 22:04:17vstinnersetmessages: + msg314554
2018-03-27 22:01:11vstinnersetmessages: + msg314553
2018-03-27 21:56:38njssetnosy: + pitrou, vstinner
messages: + msg314552
2018-03-27 20:38:51christian.heimessetnosy: + christian.heimes
messages: + msg314548
2018-03-27 20:30:14smurfixsetmessages: + msg314547
2018-02-15 18:40:29njssetnosy: + ezio.melotti, njs
2018-02-15 14:12:42smurfixsetfiles: + gc.patch

messages: + msg312207
2018-02-15 14:02:32smurfixcreate