classification
Title: Emit ResourceWarning when implicitly terminating a suspended frame?
Type: resource usage Stage:
Components: Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jstasiak, martin.panter, ncoghlan, njs, vstinner, yselivanov
Priority: normal Keywords:

Created on 2016-11-07 02:50 by ncoghlan, last changed 2017-07-19 09:12 by jstasiak.

Messages (11)
msg280185 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-11-07 02:50
There have been a few discussions recently regarding the fact that generator and coroutine cleanup can suppress ResourceWarnings from other components. For example:

    def mygen(fname):
        with open(fname) as f:
            yield from f

    print(next(mygen("README.md")))

Here, the opened file is actually being cleaned up by __del__ on the generator-iterator instance (when it throws GeneratorExit into the suspended frame), but there's no ResourceWarning as the *file itself* is being cleaned up by the file's __exit__ method.

I've been thinking about how we might be able to help developers better manage this, and one conclusion I've come to is that it means we need to start thinking about suspended frames that don't terminate naturally during the course of program execution as resources to be managed - their locals can and do reference scarce system resources, and folks are expecting "with" and "try-finally" statements to provide reliable management of those resources, even when there's a "yield", "yield from" or "await" in the nested suite.

So what do folks think of the idea of making __del__ on generator-iterator objects emit ResourceWarning in cases where:

- gi_frame is not None (i.e. the frame hasn't completed execution)
- gi_frame.f_lasti isn't -1 (i.e. the frame has started execution)

and similarly for coroutines and cr_frame?
msg280189 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2016-11-07 06:06
+1 to the change for generators. This is actually in the PEP 533 draft: https://www.python.org/dev/peps/pep-0533/#modifications-to-basic-iterator-types

For coroutines, doesn't this overlap with the existing "coroutine not awaited" warning?
msg280193 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-07 08:56
To take a decision, I would suggest to implement the warning, run the
Python test suite, and run the test suite of a few applications like
Twisted and Django, to see how much code needs to be modified.

Well, let's say that my code consumes a generator but emits the
warning because it doesn't consume completely the warning. What am I
supposed to do?

Let's say the the generator is an infine suite like (2**i for i in
itertools.count()). Should I call gen.close()?

If a generator holds a resource like a file, does gen.close()
immediately release the resource? Or does it rely on the garbage
collector?

I'm curious to see if using source=frame when emitting the warning
helps to identify quickly the origin of the bug. I added the source
parameter in Python 3.6. It requires the usage of tracemalloc to log
where the source object (the frame in this case) was allocated.
msg280200 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-11-07 14:29
As Victor suggests, the reason I ended the issue title with a question mark is because I'm worried that doing this by default is likely to be too noisy to be useful. It's worth trying out though, even if it ends up being something we have to hide behind a -X flag.

If that concern does prove well-founded, another possible way to go would be to change the way generators interact with the context management protocol. In order to ensure a clear exception when you accidentally leave out `contextlib.contextmanager`, they currently simply don't implement it, so you have to use `contextlib.closing` instead. However, you can't wrap an existing generator in that unconditionally, as you'll break direct calls (`closing` relies on `__enter__` to give callers transparent access to the original object).

One way to thread that needle would be to offer a `contextlib.managed_generator` alternative decorator that tweaked the behaviour of that particular generator-iterator instance to:

- emit a ResourceWarning in __del__
- support the context management protocol

Unfortunately, I haven't been able to think of a particularly clean implementation strategy that wouldn't introduce undesirable runtime performance overheads.
msg280728 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-11-14 03:10
This would be hard to get right, but maybe not impossible. If the frame is suspended, and will do special handling of GeneratorExit, a ResourceWarning would be nice in most cases. But if the frame will not catch any exception, there should be no ResourceWarning. I think this is the big difference between mygen() and Victor’s generator expression:

gen = (2**i for i in itertools.count())
next(gen)  # Generator suspended but won’t catch any exception
del gen  # Like deleting a plain iterator; no ResourceWarning expected

One more complication: If the frame is suspended at something like “yield from open(fname)”, it should trigger a ResourceWarning, because GeneratorExit would cause the file’s close() method to be called. But if “yield from” is calling a simpler generator-iterator with nothing to clean up, it should not emit a ResourceWarning.
msg280736 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-14 07:15
> del gen  # Like deleting a plain iterator; no ResourceWarning expected

Hum, wait, I'm not sure that I got the whole point of this issue.

If the generator uses "with obj:", "del gen" will not call obj.__exit__().

Do you consider that "del gen" is better to clear obj than not known
when gen and obj are destroyed?

Is there a solution to call obj.__exit__()?
msg280737 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-11-14 07:38
Perhaps I shouldn’t have mentioned “del gen”. That was meant to represent the garbage collector running and calling gen.__del__() or equivalent. Basically what I was trying to say is I think there will be two classes of generators:

1. Simple generators just implementing the plain iterator protocol, where there is nothing to clean up. In these cases a ResourceWarning is unwanted. Example: generator expressions.

2. Complex generators that have special cleanup code. Ideally the user should either exhaust the generator or call its close() method, and a ResourceWarning would be useful if neither of these happen. Example: Nick’s mygen().
msg280738 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-11-14 07:59
BTW my understanding is that currently, if the garbage collector deletes a generator, it effectively calls its close() method. When close() is called, it throws GeneratorExit into the suspended generator. This is meant to cause all the “with” and “try” statements to clean up.

So, yes, explicitly calling close() should immediately release the resources, call obj.__exit__(), etc. The idea of the ResourceWarning is to tell the programmer they forgot to call close().
msg280752 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-11-14 11:37
Martin raises an interesting point - the cases where ResourceWarning is likely to be appropriate are going to be those where the frame has been suspended inside a with statement or a try/finally statement.

However, we don't currently track those at runtime or at compile time in any way - they're implicit in the state of the frame stack.

That doesn't mean we *couldn't* track them though, and I think co_lnotab provides a hint as to how we could do it with minimal runtime overhead: add a new co_cleanup data table to code objects that provides efficiently packed storage for a sequence of "setup/cleanup" bytecode offset 2-tuples.

All code objects would gain an additional pointer, and those that used with and try/finally would get fractionally larger than that, but frame objects would remain the same size they are today.

Glossing over the complexities of large bytecode offsets (ala co_lnotab), consider:

>>> def g_with():
...     with ...:
...         pass
... 
>>> dis.dis(g_with)
  2           0 LOAD_CONST               1 (Ellipsis)
              3 SETUP_WITH               5 (to 11)
              6 POP_TOP

  3           7 POP_BLOCK
              8 LOAD_CONST               0 (None)
        >>   11 WITH_CLEANUP_START
             12 WITH_CLEANUP_FINISH
             13 END_FINALLY
             14 LOAD_CONST               0 (None)
             17 RETURN_VALUE

Here, co_cleanup would include a single 2-tuple with '0x03' as an absolute offset (for the setup) and '0x08' as a relative offset (for the cleanup). If '(f_lasti - 0x03) < 0x08' then __del__ on a generator-iterator or coroutine would raise ResourceWarning.

>>> def g_finally():
...     try:
...         pass
...     finally:
...         pass
... 
>>> dis.dis(g_finally)
  2           0 SETUP_FINALLY            4 (to 7)

  3           3 POP_BLOCK
              4 LOAD_CONST               0 (None)

  5     >>    7 END_FINALLY
              8 LOAD_CONST               0 (None)
             11 RETURN_VALUE

Here, co_cleanup would include a single 2-tuple with '0x00' as an absolute offset (for the setup) and '0x07' as a relative offset (for the cleanup). If '(f_lasti - 0x00) < 0x07' then __del__ on a generator-iterator or coroutine would raise ResourceWarning.

The key is that *only* __del__ would get slower (not explicit close() calls), and it would only get slower in cases where the frame was still suspended *and* co_cleanup was populated.
msg280762 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-11-14 12:16
Oops, I forgot to mention one other potential cost-minimisation strategy for a `co_cleanuptab` field: only populate it with setup/cleanup opcode pairs that include a yield, yield from, or await operation between them. 

The potential benefit I can see to *not* doing that is that if the information is always available (even on synchronous frames), then greenlet based frameworks like gevent may be able to make use of it.
msg290953 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2017-04-01 02:10
It does make sense to skip emitting a warning if there's no try or with block active.

Don't we already have the ability to check for this, though, without any extensions to the frame or code objects? That's what the public PyGen_NeedsFinalizing does, right? It's implemented as a for loop over the frame's block stack, which in most cases should be extremely small, so the performance cost of running this from generator.__del__ seems likely to be minimal.

(I think currently the implementation is not *quite* correct if there's a 'yield from' active – in most cases it won't much matter because if A is yielding from B and A is collected, then B will probably be collected a moment later and have its own __del__ called. But this is not *quite* the same as what should happen, which is that collecting A should immediately call B.close(), which in principle might do something arbitrarily different than B.__del__. But adding a check for whether a 'yield from' is active would be pretty trivial.)
History
Date User Action Args
2017-07-19 09:12:49jstasiaksetnosy: + jstasiak
2017-04-01 02:10:10njssetmessages: + msg290953
2016-11-14 12:16:23ncoghlansetmessages: + msg280762
2016-11-14 11:37:44ncoghlansetmessages: + msg280752
2016-11-14 07:59:46martin.pantersetmessages: + msg280738
2016-11-14 07:38:50martin.pantersetmessages: + msg280737
2016-11-14 07:15:53vstinnersetmessages: + msg280736
2016-11-14 03:10:56martin.pantersetnosy: + martin.panter
messages: + msg280728
2016-11-07 14:29:24ncoghlansetmessages: + msg280200
2016-11-07 08:56:04vstinnersetmessages: + msg280193
2016-11-07 06:06:55njssetmessages: + msg280189
2016-11-07 02:50:31ncoghlancreate