Message 308090 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	njs
Recipients	chris.jerdonek, giampaolo.rodola, mbussonn, ncoghlan, njs, vstinner, yselivanov
Date	2017-12-12.07:55:54
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1513065354.82.0.213398074469.issue30491@psf.upfronthosting.co.za>
In-reply-to

Content
Update! I've been experimenting with this some more, and here's a more detailed proposal, that I'd ideally like to get into 3.7. I don't think this is big enough to need a PEP? I dunno, thoughts on that welcome. Motivation: It's easy to accidentally write 'f()' where you meant 'await f()', which is why Python issues a warning whenever an unawaited coroutine is GCed. This helps, and for asyncio proper, it may not be possible to do better than this -- since the problem is detected at GC time, there's very little we can do except print a warning. In particular, we can't raise an error. But this warning is still easy to miss, and prone to obscure problems: it's easy to have a test that passes ... because it didn't actually run any code. And then the warning is attached to a different test entirely. But, in some specific cases, we could do better: for example, if pytest-asyncio could check for unawaited coroutines after each test, it could immediately raise a proper and detailed error on the correct test. And if trio could check for unawaited coroutines at selected points like schedule points, it could reliably detect these problems and raise them as errors, right at the source. Specification: We add two new functions, sys.set_unawaited_coroutine_tracking(enabled: bool) -> None and sys.collect_unawaited_coroutines() -> List[Coroutine]. The semantics are: internally, there is a thread-local bool I'll call tracking_enabled that defaults to False. set_unawaited_coroutine_tracking lets you set it. If tracking_enabled == False, everything works like now. If tracking_enabled == True, then the interpreter internally keeps a table of all unawaited coroutine objects: when a coroutine object is created, it's automatically added to the table; when it's awaited, it's automatically removed. When collect_unawaited_coroutines is called, it returns the current contents of the table as a list, and clears it. The table holds a strong reference to the coroutines in it, which makes this is a simple and reliable way to track unawaited coroutines (but also means that we need the enable/disable API instead of leaving it on all the time, because once it's enabled someone needs to call collect_unawaited_coroutines regularly to avoid a memory leak). Implementation: this can be made fast and cheap by storing the table as a thread-specific intrusive double-linked list. Basically each coroutine object would gain two pointer slots (this adds a small amount of memory overhead, but a coroutine object + frame is already >500 bytes, so the relative overhead is low), which are used to link it into a list when it's created (O(1), very cheap), and then unlink it again when it's awaited (also O(1), very cheap). Rejected alternatives: - The original comment above suggested keeping a count of unawaited coroutines instead of tracking the actual objects, but this way is just about as cheap while (a) allowing for much better debugging information when an unawaited coroutine is detected, since you have the actual objects there and (b) avoiding a mess of issues around unawaited coroutines that get GCed before the user checks for them. - What about using the existing coroutine wrapper hook? You could do this, but this proposal has two advantages. First, it's much faster, which is important because Trio wants to run with this enabled by default, and pytest-asyncio doesn't want to slow down everyone's test suites too much. (I should benchmark this properly, but in general the coroutine wrappers add a ton of overhead b/c they're effectively a whole new Python-level object being allocated on every function call.) And second, since the coroutine wrapper hook is such a generic mechanism, it's prone to collisions between different uses. For example, pytest-asyncio's unawaited coroutine detection and asyncio's debug mode seem like they ought to complement each other: pytest-asyncio finds the problematic coroutines, and then asyncio's debug mode gives the details on where they came from. But if they're both trying to use the same coroutine wrapper hook, then they'll end up fighting over it. So this proposal follows Python's general rule that generic hooks are fine when you really need an escape hatch, but if there's a specific use case it's often worth handling it specifically. (Recent example: module __class__ assignment vs. PEP 562.)

Update!

I've been experimenting with this some more, and here's a more detailed proposal, that I'd ideally like to get into 3.7. I don't *think* this is big enough to need a PEP? I dunno, thoughts on that welcome.

Motivation: It's easy to accidentally write 'f()' where you meant 'await f()', which is why Python issues a warning whenever an unawaited coroutine is GCed. This helps, and for asyncio proper, it may not be possible to do better than this -- since the problem is detected at GC time, there's very little we can do *except* print a warning. In particular, we can't raise an error. But this warning is still easy to miss, and prone to obscure problems: it's easy to have a test that passes ... because it didn't actually run any code. And then the warning is attached to a different test entirely. But, in some specific cases, we could do better: for example, if pytest-asyncio could check for unawaited coroutines after each test, it could immediately raise a proper and detailed error on the correct test. And if trio could check for unawaited coroutines at selected points like schedule points, it could reliably detect these problems and raise them as errors, right at the source.

Specification: We add two new functions, sys.set_unawaited_coroutine_tracking(enabled: bool) -> None and sys.collect_unawaited_coroutines() -> List[Coroutine]. The semantics are: internally, there is a thread-local bool I'll call tracking_enabled that defaults to False. set_unawaited_coroutine_tracking lets you set it. If tracking_enabled == False, everything works like now. If tracking_enabled == True, then the interpreter internally keeps a table of all unawaited coroutine objects: when a coroutine object is created, it's automatically added to the table; when it's awaited, it's automatically removed. When collect_unawaited_coroutines is called, it returns the current contents of the table as a list, and clears it. The table holds a strong reference to the coroutines in it, which makes this is a simple and reliable way to track unawaited coroutines (but also means that we need the enable/disable API instead of leaving it on all the time, because once it's enabled someone needs to call collect_unawaited_coroutines regularly to avoid a memory leak).

Implementation: this can be made fast and cheap by storing the table as a thread-specific intrusive double-linked list. Basically each coroutine object would gain two pointer slots (this adds a small amount of memory overhead, but a coroutine object + frame is already >500 bytes, so the relative overhead is low), which are used to link it into a list when it's created (O(1), very cheap), and then unlink it again when it's awaited (also O(1), very cheap).

Rejected alternatives:

- The original comment above suggested keeping a count of unawaited coroutines instead of tracking the actual objects, but this way is just about as cheap while (a) allowing for much better debugging information when an unawaited coroutine is detected, since you have the actual objects there and (b) avoiding a mess of issues around unawaited coroutines that get GCed before the user checks for them.

- What about using the existing coroutine wrapper hook? You could do this, but this proposal has two advantages. First, it's much faster, which is important because Trio wants to run with this enabled by default, and pytest-asyncio doesn't want to slow down everyone's test suites too much. (I should benchmark this properly, but in general the coroutine wrappers add a ton of overhead b/c they're effectively a whole new Python-level object being allocated on every function call.) And second, since the coroutine wrapper hook is such a generic mechanism, it's prone to collisions between different uses. For example, pytest-asyncio's unawaited coroutine detection and asyncio's debug mode seem like they ought to complement each other: pytest-asyncio finds the problematic coroutines, and then asyncio's debug mode gives the details on where they came from. But if they're both trying to use the same coroutine wrapper hook, then they'll end up fighting over it. So this proposal follows Python's general rule that generic hooks are fine when you really need an escape hatch, but if there's a specific use case it's often worth handling it specifically. (Recent example: module __class__ assignment vs. PEP 562.)

History
Date	User	Action	Args
2017-12-12 07:55:54	njs	set	recipients: + njs, ncoghlan, vstinner, giampaolo.rodola, chris.jerdonek, yselivanov, mbussonn
2017-12-12 07:55:54	njs	set	messageid: <1513065354.82.0.213398074469.issue30491@psf.upfronthosting.co.za>
2017-12-12 07:55:54	njs	link	issue30491 messages
2017-12-12 07:55:54	njs	create