Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[subinterpreters] GC should happen when a subinterpreter is destroyed #68742

Closed
ericsnowcurrently opened this issue Jul 3, 2015 · 14 comments
Closed
Labels
3.9 only security fixes topic-subinterpreters type-bug An unexpected behavior, bug, or error

Comments

@ericsnowcurrently
Copy link
Member

BPO 24554
Nosy @ncoghlan, @pitrou, @vstinner, @phsilva, @ericsnowcurrently, @ajdavis

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-01-08.13:36:19.447>
created_at = <Date 2015-07-03.00:41:17.474>
labels = ['expert-subinterpreters', 'type-bug', '3.9']
title = '[subinterpreters] GC should happen when a subinterpreter is destroyed'
updated_at = <Date 2020-05-15.01:13:31.252>
user = 'https://github.com/ericsnowcurrently'

bugs.python.org fields:

activity = <Date 2020-05-15.01:13:31.252>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2020-01-08.13:36:19.447>
closer = 'vstinner'
components = ['Subinterpreters']
creation = <Date 2015-07-03.00:41:17.474>
creator = 'eric.snow'
dependencies = []
files = []
hgrepos = []
issue_num = 24554
keywords = []
message_count = 14.0
messages = ['246110', '246157', '246158', '246159', '246160', '246161', '246162', '246163', '246165', '246166', '341983', '357063', '358064', '359593']
nosy_count = 7.0
nosy_names = ['ncoghlan', 'pitrou', 'vstinner', 'phsilva', 'grahamd', 'eric.snow', 'emptysquare']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue24554'
versions = ['Python 3.9']

@ericsnowcurrently
Copy link
Member Author

Per A. Jesse Jiryu Davis:

===========================

# mod.py
class C(object):
    pass

class Pool(object):
    def __del__(self):
        print('del')
        list()

C.pool = Pool()
===========================

===========================

int main()
{
    Py_Initialize();
    PyThreadState *tstate_enter = PyThreadState_Get();
    PyThreadState *tstate = Py_NewInterpreter();

    PyRun_SimpleString("import mod\n");
    if (PyErr_Occurred()) {
        PyErr_Print();
    }
    Py_EndInterpreter(tstate);
    PyThreadState_Swap(tstate_enter);
    printf("about to finalize\n");
    Py_Finalize();
    printf("done\n");

    return 0;
}

===========================

See:
http://emptysqua.re/blog/a-normal-accident-in-python-and-mod-wsgi/
GrahamDumpleton/mod_wsgi#43

@ericsnowcurrently ericsnowcurrently added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Jul 3, 2015
@pitrou
Copy link
Member

pitrou commented Jul 3, 2015

What is that supposed to demonstrate? GC is a global operation and is not tied to subinterpreters: GC may happen in any interpreter, not necessarily the one where the resource was allocated.

@grahamd
Copy link
Mannequin

grahamd mannequin commented Jul 3, 2015

That GC happens on an object in the wrong interpreter in this case is the problem as it can result in used code execution against the wrong interpreter context.

If you are saying this can happen anytime in the life of a sub interpreter and not just in this case of when a sub interpreter is destroyed followed by destruction of the main interpreter, then that is an even bigger flaw.

@pitrou
Copy link
Member

pitrou commented Jul 3, 2015

It's just a consequence of subinterpreters not being isolated contexts. They're sharing of lot of stuff by construction (hence being called "subinterpreters"). And indeed some resource can become unreachable in a subinterpreter, and collected from another, if the resource is part of a reference cycle.

@pitrou
Copy link
Member

pitrou commented Jul 3, 2015

I don't think this is a very important issue, by the way. Normal destructors will usually rely on resources on their global environment, i.e. the function's globals or builtins dict, which will point to the right namespace. Only if you are explicitly looking up something on the interpreter (or using e.g. thread-local storage... but relying on thread-local storage in a destructor is already broken anyway) will you see such discrepancies.

@pitrou
Copy link
Member

pitrou commented Jul 3, 2015

From a quick look at the PyInterpreterState, stuff that may be risky to rely on:

  • mutable data from the sys module (mainly import-related data: sys.path, sys.meta_path, etc.)
  • codecs registry metadata

Of course third-party modules (C extensions) may key additional data on the current interpreter, but I can't think of any stdlib module that does.

@grahamd
Copy link
Mannequin

grahamd mannequin commented Jul 3, 2015

If this issue with GC can't be addressed and sub interpreters isolated better, then there is no point pursing then the idea that has been raised at the language summit of giving each sub interpreter its own GIL and then provide mechanisms to allow code executing in one sub interpreter to delegate other code to run in the context of a different sub interpreter, thus allowing effective use of multi core systems.

That was the bigger goal and this was one of the issues which would need to be fixed.

@pitrou
Copy link
Member

pitrou commented Jul 3, 2015

I don't have any opinion on whether this issue kills the "parallelism with sub-interpreters" idea (I'm not sure why it would). But regardless, solving this issue will be very non-trivial, or perhaps very involved.

Running the GC at the end of a subinterpreter certainly won't solve the general problem of objects becoming unreachable in an interpreter (i.e. the last external reference being lost in that interpreter) and collected by the cyclic GC from another.

@grahamd
Copy link
Mannequin

grahamd mannequin commented Jul 3, 2015

Right now mod_wsgi is the main user of sub interpreters. I wasn't even aware of this issue until Jesse found it. Thus in 7+ years, it never presented a problem in practice, possibly because in mod_wsgi sub interpreters are only ever destroyed on process shutdown and causing an issue at that point or a process crash was not noticed and tolerable

If however you are going to implement the "parallelism with sub-interpreters' idea you are making the possibility of encountering problems much more prevalent because you will likely have many more people using the feature, plus that a sub interpreter may be ephemeral and not necessarily kept around for the life of process, but destroyed at any time thus more readily pushing GC of objects into a different sub interpreter context if that is what can occur now.

It therefore seems to me that this would open up a huge can of worms if left to work as it does now with users seeing all sorts of unexpected behaviour if not very careful. Also, for GC of objects to be able to be done in a different interpreter context seems to suggest to me that the global GIL for whole process couldn't be eliminated in the first place. So at this I point can't see how you could move to a separate GIL for each interpreter context, if GC for each interpreter can't be separated easily or at all.

@ncoghlan
Copy link
Contributor

ncoghlan commented Jul 3, 2015

We already knew that reference count management was likely to be one of the thorniest problems with allowing true subinterpreter level concurrency, this issue just reminds us that the cyclic GC is going to be a challenge as well.

@ericsnowcurrently
Copy link
Member Author

FYI, issue bpo-36854 is about moving GC runtime state from _PyRuntimeState to PyInterpreterState. However, that doesn't trigger any collection when the interpreter is finalized. So there is more to be done here.

@vstinner
Copy link
Member

bpo-36854 has been fixed, so it's time to reconsider fixing this issue :-)

@vstinner
Copy link
Member

vstinner commented Dec 9, 2019

This issue is partially fixed in the master branch. Extract of the finalize_interp_clear() function, called by Py_EndInterpreter():

    /* Clear interpreter state and all thread states */
    PyInterpreterState_Clear(tstate->interp);

    /* Trigger a GC collection on subinterpreters*/
    if (!is_main_interp) {
        _PyGC_CollectNoFail();
    }

gc.collect() is now called.

It's only "partially" fixed because I would prefer to trigger a GC collection before or during PyInterpreterState_Clear(). IMHO trigger it after PyInterpreterState_Clear() creates a risk of crash in finalizers written in C which don't handle well before called very late during Python finalization. After PyInterpreterState_Clear(), Python is basically unusable. All modules are cleared.

@vstinner
Copy link
Member

vstinner commented Jan 8, 2020

Py_EndInterpreter() now calls gc.collect() at least twice: at the end of _PyImport_Cleanup() and in finalize_interp_clear(). I now consider the issue as fixed and so I close it.

The issue that I described in my previous comment can be enhanced/fixed later.

@vstinner vstinner added the 3.9 only security fixes label Jan 8, 2020
@vstinner vstinner closed this as completed Jan 8, 2020
@vstinner vstinner added topic-subinterpreters and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels May 15, 2020
@vstinner vstinner changed the title GC should happen when a subinterpreter is destroyed [subinterpreters] GC should happen when a subinterpreter is destroyed May 15, 2020
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes topic-subinterpreters type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants