Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop purging modules which are garbage collected before shutdown #62414

Closed
sbt mannequin opened this issue Jun 14, 2013 · 24 comments
Closed

Stop purging modules which are garbage collected before shutdown #62414

sbt mannequin opened this issue Jun 14, 2013 · 24 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@sbt
Copy link
Mannequin

sbt mannequin commented Jun 14, 2013

BPO 18214
Nosy @gpshead, @amauryfa, @pitrou, @tiran
Files
  • prevent-purge-before-shutdown.patch
  • module_cleanup.patch
  • module_cleanup2.patch
  • module_cleanup3.patch
  • module_cleanup4.patch
  • module_cleanup5.patch
  • check_purging.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-08-01.15:14:06.763>
    created_at = <Date 2013-06-14.14:59:39.507>
    labels = ['interpreter-core', 'type-feature']
    title = 'Stop purging modules which are garbage collected before shutdown'
    updated_at = <Date 2013-08-01.20:12:44.810>
    user = 'https://bugs.python.org/sbt'

    bugs.python.org fields:

    activity = <Date 2013-08-01.20:12:44.810>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-08-01.15:14:06.763>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2013-06-14.14:59:39.507>
    creator = 'sbt'
    dependencies = []
    files = ['30583', '31090', '31091', '31093', '31099', '31101', '31105']
    hgrepos = []
    issue_num = 18214
    keywords = ['patch']
    message_count = 24.0
    messages = ['191135', '191217', '191229', '191230', '193945', '193957', '193976', '193991', '194015', '194020', '194021', '194026', '194040', '194042', '194043', '194044', '194045', '194047', '194055', '194066', '194069', '194076', '194079', '194111']
    nosy_count = 7.0
    nosy_names = ['gregory.p.smith', 'amaury.forgeotdarc', 'pitrou', 'christian.heimes', 'Arfrever', 'python-dev', 'sbt']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue18214'
    versions = ['Python 3.4']

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Jun 14, 2013

    Currently when a module is garbage collected its dict is purged by replacing all values except __builtins__ by None. This helps clear things at shutdown.

    But this can cause problems if it occurs *before* shutdown: if we use a function defined in a module which has been garbage collected, then that function must not depend on any globals, because they will have been purged.

    Usually this problem only occurs with programs which manipulate sys.modules. For example when setuptools and nose run tests they like to reset sys.modules each time. See for example

    http://bugs.python.org/issue15881

    See also

    http://bugs.python.org/issue16718

    The trivial patch attached prevents the purging behaviour for modules gc'ed before shutdown begins. Usually garbage collection will end up clearing the module's dict anyway.

    I checked the count of refs and blocks reported on exit when running a trivial program and a full regrtest (which will cause quite a bit of sys.modules manipulation). The difference caused by the patch is minimal.

    Without patch:
    do nothing: [20234 refs, 6582 blocks]
    full regrtest: [92713 refs, 32597 blocks]

    With patch:
    do nothing: [20234 refs, 6582 blocks]
    full regrtest: [92821 refs, 32649 blocks]

    @pitrou
    Copy link
    Member

    pitrou commented Jun 15, 2013

    Usually garbage collection will end up clearing the module's dict anyway.

    This is not true, since global objects might have a __del__ and then hold the whole module dict alive through a reference cycle. Happily though, PEP-442 is going to make that concern obsolete.

    As for the interpreter shutdown itself, I have a pending patch (post-PEP 442) to get rid of the globals cleanup as well. It may be better to merge the two approaches.

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Jun 15, 2013

    On 15/06/2013 7:11pm, Antoine Pitrou wrote:

    > Usually garbage collection will end up clearing the module's dict anyway.

    This is not true, since global objects might have a __del__ and then hold
    the whole module dict alive through a reference cycle. Happily though,
    PEP-442 is going to make that concern obsolete.

    I did say "usually".

    As for the interpreter shutdown itself, I have a pending patch (post-PEP 442)
    to get rid of the globals cleanup as well. It may be better to merge the two approaches.

    So you would just depend on garbage collection? Do you know how many
    refs/blocks are left at exit if one just uses garbage collection
    (assuming PEP-442 is in effect)? I suppose adding GC support to those
    modules which currently lack it would help a lot.

    BTW, I had a more complicated patch which keeps track of module dicts
    using weakrefs and purges any which were left after garbage collection
    has had a chance to free stuff. But most module dicts ended up being
    purged anyway, so it did not seem worth the hassle when a two-line patch
    mostly fixes the immediate problem.

    @pitrou
    Copy link
    Member

    pitrou commented Jun 15, 2013

    > As for the interpreter shutdown itself, I have a pending patch (post-PEP 442)
    > to get rid of the globals cleanup as well. It may be better to merge the two approaches.

    So you would just depend on garbage collection?

    No, I also clean up those modules that are left alive after a garbage
    collection pass.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 30, 2013

    Now that PEP-442 is committed, here is the patch.

    @pitrou pitrou added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jul 30, 2013
    @pitrou
    Copy link
    Member

    pitrou commented Jul 30, 2013

    Slightly better patch.

    Also, as I pointed out in python-dev (http://mail.python.org/pipermail/python-dev/2013-July/127673.html), this is still imperfect due to various ways modules can be kept alive from long-lived C variables.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 31, 2013

    See bpo-10068 and bpo-7140.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 31, 2013

    Updated patch has tests and also removes several cleanup hacks.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 31, 2013

    Updated patch with a hack in Lib/site to unpatch builtins early at shutdown.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 31, 2013

    New changeset 79e2f5bbc30c by Antoine Pitrou in branch 'default':
    Issue bpo-18214: Improve finalization of Python modules to avoid setting their globals to None, in most cases.
    http://hg.python.org/cpython/rev/79e2f5bbc30c

    @pitrou
    Copy link
    Member

    pitrou commented Jul 31, 2013

    Let's wait for the buildbots on this one too.

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Jul 31, 2013

    I played a bit with the patch and -v -Xshowrefcount. The number of references and blocks left at exit varies (and is higher than for unpatched python).

    It appears that a few (1-3) module dicts are not being purged because they have been "orphaned". (i.e. the module object was garbaged collected before we check the weakref, but the module dict survived.) Presumably it is the hash randomization causing the randomness.

    Maybe 8 out of 50+ module dicts actually die a natural death by being garbage collected before they are purged. Try

    ./python -v -Xshowrefcount check_purging.py
    

    @pitrou
    Copy link
    Member

    pitrou commented Aug 1, 2013

    It appears that a few (1-3) module dicts are not being purged because they
    have been "orphaned". (i.e. the module object was garbaged collected before
    we check the weakref, but the module dict survived.)

    Module globals can be kept alive by any function defined in that module. So if that function is registered eternally in a C static variable, the globals dict will never get collected.

    ./python -v -Xshowrefcount check_purging.py

    I always get either:

    # remaining {'encodings', '__main__'}
    [...]
    [24834 refs, 7249 blocks]

    or

    # remaining {'__main__', 'encodings'}
    [...]
    [24834 refs, 7249 blocks]

    ... which seems to hint that it is quite stable actually.
    The encodings globals are kept alive because of the codecs registration, I believe.
    As for the __main__ dict, perhaps we're missing a decref somewhere.

    Maybe 8 out of 50+ module dicts actually die a natural death by being
    garbage collected before they are purged.

    I get different numbers from you. If I run "./python -v -c pass", most modules in the "wiping" phase are C extension modules, which is expected. Pretty much every pure Python module ends up garbage collected before that.

    By the way, please also try bpo-18608 which will bring an other improvement.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 1, 2013

    As for the __main__ dict, perhaps we're missing a decref somewhere.

    Actually, it's not surprising. Blob's methods hold a reference to the __main__ globals, and there's still a Blob object alive in encodings.

    If you replace the end of your script with the following:

    for name, mod in sys.modules.items():
        if name != 'encodings':
            mod.__dict__["__blob__"] = Blob(name)
    del name, mod, Blob

    then at the end of the shutdown phase, remaining is empty.

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 1, 2013

    On 01/08/2013 10:59am, Antoine Pitrou wrote:

    If you replace the end of your script with the following:

    for name, mod in sys.modules.items():
    if name != 'encodings':
    mod.__dict__["__blob__"] = Blob(name)
    del name, mod, Blob

    then at the end of the shutdown phase, remaining is empty.

    On Windows, even with this change, I get for example:

    # remaining {'encodings.mbcs', '__main__', 'encodings.cp1252'}
    ...
    [22081 refs, 6742 blocks]

    or

    # remaining {'__main__', 'encodings'}
    ...
    [23538 refs, 7136 blocks]

    @pitrou
    Copy link
    Member

    pitrou commented Aug 1, 2013

    You might want to open a prompt and look at gc.get_referrers() for encodings.mbcs.__dict__ (or another of those modules).

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 1, 2013

    You might want to open a prompt and look at gc.get_referrers() for
    encodings.mbcs.__dict__ (or another of those modules).

    >>> gc.get_referrers(sys.modules['encodings.mbcs'].__dict__)
    [<module 'encodings.mbcs' from 'C:\\Repos\\cpython-dirty\\lib\\encodings\\mbcs.py'>, <function decode at 0x01DEEF38>, <function getregentry at 0x01DFA038>, <function IncrementalEncoder.encode at 0x01DFA098>]
    
    >>> gc.get_referrers(sys.modules['encodings.cp1252'].__dict__)
    [<module 'encodings.cp1252' from 'C:\\Repos\\cpython-dirty\\lib\\encodings\\cp1252.py'>, <function getregentry at 0x02802578>, <function Codec.encode at 0x02802518>, <function Codec.decode at 0x028025D8>, <function IncrementalEncoder.encode at 0x02802638>, <function IncrementalDecoder.decode at 0x02802698>]
    
    >>> gc.get_referrers(sys.modules['__main__'].__dict__)
    [<function Blob.__init__ at 0x0057ABD8>, <function Blob.__del__ at 0x02AD36F8>,
    <frame object at 0x027DFA80>, <function <listcomp> at 0x02AD3DB8>, <frame object at 0x02A38038>, <module '__main__' (<_frozen_importlib.SourceFileLoader object
    at 0x0271EAB8>)>]

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 1, 2013

    I get different numbers from you. If I run "./python -v -c pass", most
    modules in the "wiping" phase are C extension modules, which is expected.
    Pretty much every pure Python module ends up garbage collected before
    that.

    The *module* gets gc'ed, sure. But you can't tell from "./python -v -c pass" when the *module dict* get gc'ed.

    Using "./python -v check_purging.py", before the purging stage (# cleanup [3]) I only get

    # purge/gc operator 54
    # purge/gc io 53
    # purge/gc keyword 52
    # purge/gc types 51
    # purge/gc sysconfig 50

    That leaves lots of pure python module dicts to be purged later on.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 1, 2013

    Here (Linux) I get the following:

    # purge/gc os.path 12
    # purge/gc operator 50
    # purge/gc io 49
    # purge/gc _sysconfigdata 48
    # purge/gc sysconfig 47
    # purge/gc keyword 46
    # purge/gc site 45
    # purge/gc types 44

    Also, do note that purge/gc after wiping can still be a regular gc pass unless the module has been wiped. The gc could be triggered by another module being wiped.

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 1, 2013

    Also, do note that purge/gc after wiping can still be a regular
    gc pass unless the module has been wiped. The gc could be triggered
    by another module being wiped.

    For me, the modules which die naturally after purging begins are

    # purge/gc encodings.aliases 34
    # purge/gc _io 14
    # purge/gc collections.abc 13
    # purge/gc sre_compile 12
    # purge/gc heapq 11
    # purge/gc sre_constants 10
    # purge/gc _weakrefset 9
    # purge/gc reprlib 8
    # purge/gc weakref 7
    # purge/gc site 6
    # purge/gc abc 5
    # purge/gc encodings.latin_1 4
    # purge/gc encodings.utf_8 3
    # purge/gc genericpath 2

    Of these, all but the first appear to happen during the final cyclic
    garbage collection.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 1, 2013

    Also, do note that purge/gc after wiping can still be a regular gc
    pass unless the module has been wiped. The gc could be triggered by
    another module being wiped.

    That said, I welcome any suggestions to improve things. The ultimate
    reasons we need to purge some modules are the same two reasons I
    indicated on python-dev: C extension modules are almost immortal;
    and some C code keeps references alive too long.

    Do you agree that this patch is ok and we should address those two
    problems in separate new issues?

    @sbt
    Copy link
    Mannequin Author

    sbt mannequin commented Aug 1, 2013

    Yes, I agree the patch is ok.

    It would be would be much simpler to keep track of the module dicts if
    they were weakrefable. Alternatively, at shutdown a weakrefable object
    with a reference to the module dict could be inserted in to each module
    dict. We could then use those to find orphaned module dicts. But I
    doubt it is worth the extra effort.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 1, 2013

    Ok, let's attack the rest separately then :)
    And thanks a lot for testing!

    @pitrou pitrou closed this as completed Aug 1, 2013
    @pitrou
    Copy link
    Member

    pitrou commented Aug 1, 2013

    By the way, you may be interested to learn that the patch in bpo-10241 has made things quite a bit better now: C extension modules can be collected much earlier.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant