classification
Title: Stop purging modules which are garbage collected before shutdown
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, amaury.forgeotdarc, christian.heimes, gregory.p.smith, pitrou, python-dev, sbt
Priority: normal Keywords: patch

Created on 2013-06-14 14:59 by sbt, last changed 2013-08-01 20:12 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
prevent-purge-before-shutdown.patch sbt, 2013-06-14 14:59 review
module_cleanup.patch pitrou, 2013-07-30 18:26 review
module_cleanup2.patch pitrou, 2013-07-30 21:36 review
module_cleanup3.patch pitrou, 2013-07-30 22:03 review
module_cleanup4.patch pitrou, 2013-07-31 09:35 review
module_cleanup5.patch pitrou, 2013-07-31 20:17 review
check_purging.py sbt, 2013-07-31 22:39
Messages (24)
msg191135 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-06-14 14:59
Currently when a module is garbage collected its dict is purged by replacing all values except __builtins__ by None.  This helps clear things at shutdown. 

But this can cause problems if it occurs *before* shutdown: if we use a function defined in a module which has been garbage collected, then that function must not depend on any globals, because they will have been purged.

Usually this problem only occurs with programs which manipulate sys.modules.  For example when setuptools and nose run tests they like to reset sys.modules each time.  See for example

  http://bugs.python.org/issue15881

See also

  http://bugs.python.org/issue16718

The trivial patch attached prevents the purging behaviour for modules gc'ed before shutdown begins.  Usually garbage collection will end up clearing the module's dict anyway.

I checked the count of refs and blocks reported on exit when running a trivial program and a full regrtest (which will cause quite a bit of sys.modules manipulation).  The difference caused by the patch is minimal.

Without patch:
  do nothing:    [20234 refs, 6582 blocks]
  full regrtest: [92713 refs, 32597 blocks]

With patch:
  do nothing:    [20234 refs, 6582 blocks]
  full regrtest: [92821 refs, 32649 blocks]
msg191217 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-06-15 18:11
> Usually garbage collection will end up clearing the module's dict anyway.

This is not true, since global objects might have a __del__ and then hold the whole module dict alive through a reference cycle. Happily though, PEP 442 is going to make that concern obsolete.

As for the interpreter shutdown itself, I have a pending patch (post-PEP 442) to get rid of the globals cleanup as well. It may be better to merge the two approaches.
msg191229 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-06-15 19:43
On 15/06/2013 7:11pm, Antoine Pitrou wrote:
>> Usually garbage collection will end up clearing the module's dict anyway.
>
> This is not true, since global objects might have a __del__ and then hold
> the whole module dict alive through a reference cycle. Happily though,
> PEP 442 is going to make that concern obsolete.

I did say "usually".

> As for the interpreter shutdown itself, I have a pending patch (post-PEP 442)
> to get rid of the globals cleanup as well. It may be better to merge the two approaches.

So you would just depend on garbage collection?  Do you know how many 
refs/blocks are left at exit if one just uses garbage collection 
(assuming PEP 442 is in effect)?  I suppose adding GC support to those 
modules which currently lack it would help a lot.

BTW, I had a more complicated patch which keeps track of module dicts 
using weakrefs and purges any which were left after garbage collection 
has had a chance to free stuff.  But most module dicts ended up being 
purged anyway, so it did not seem worth the hassle when a two-line patch 
mostly fixes the immediate problem.
msg191230 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-06-15 19:44
> > As for the interpreter shutdown itself, I have a pending patch (post-PEP 442)
> > to get rid of the globals cleanup as well. It may be better to merge the two approaches.
> 
> So you would just depend on garbage collection?

No, I also clean up those modules that are left alive after a garbage
collection pass.
msg193945 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-07-30 18:26
Now that PEP 442 is committed, here is the patch.
msg193957 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-07-30 21:36
Slightly better patch.

Also, as I pointed out in python-dev (http://mail.python.org/pipermail/python-dev/2013-July/127673.html), this is still imperfect due to various ways modules can be kept alive from long-lived C variables.
msg193976 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-07-31 06:44
See issue10068 and issue7140.
msg193991 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-07-31 09:35
Updated patch has tests and also removes several cleanup hacks.
msg194015 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-07-31 20:17
Updated patch with a hack in Lib/site to unpatch builtins early at shutdown.
msg194020 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-31 21:15
New changeset 79e2f5bbc30c by Antoine Pitrou in branch 'default':
Issue #18214: Improve finalization of Python modules to avoid setting their globals to None, in most cases.
http://hg.python.org/cpython/rev/79e2f5bbc30c
msg194021 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-07-31 21:16
Let's wait for the buildbots on this one too.
msg194026 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-07-31 22:39
I played a bit with the patch and -v -Xshowrefcount.  The number of references and blocks left at exit varies (and is higher than for unpatched python).

It appears that a few (1-3) module dicts are not being purged because they have been "orphaned".  (i.e. the module object was garbaged collected before we check the weakref, but the module dict survived.)  Presumably it is the hash randomization causing the randomness.

Maybe 8 out of 50+ module dicts actually die a natural death by being garbage collected before they are purged.  Try

    ./python -v -Xshowrefcount check_purging.py
msg194040 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-01 09:49
> It appears that a few (1-3) module dicts are not being purged because they 
> have been "orphaned".  (i.e. the module object was garbaged collected before 
> we check the weakref, but the module dict survived.) 

Module globals can be kept alive by any function defined in that module. So if that function is registered eternally in a C static variable, the globals dict will never get collected.

> ./python -v -Xshowrefcount check_purging.py

I always get either:

# remaining {'encodings', '__main__'}
[...]
[24834 refs, 7249 blocks]

or

# remaining {'__main__', 'encodings'}
[...]
[24834 refs, 7249 blocks]

... which seems to hint that it is quite stable actually.
The encodings globals are kept alive because of the codecs registration, I believe.
As for the __main__ dict, perhaps we're missing a decref somewhere.

> Maybe 8 out of 50+ module dicts actually die a natural death by being 
> garbage collected before they are purged.

I get different numbers from you. If I run "./python -v -c pass", most modules in the "wiping" phase are C extension modules, which is expected. Pretty much every pure Python module ends up garbage collected before that.

By the way, please also try issue18608 which will bring an other improvement.
msg194042 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-01 09:59
> As for the __main__ dict, perhaps we're missing a decref somewhere.

Actually, it's not surprising. Blob's methods hold a reference to the __main__ globals, and there's still a Blob object alive in encodings.

If you replace the end of your script with the following:

for name, mod in sys.modules.items():
    if name != 'encodings':
        mod.__dict__["__blob__"] = Blob(name)
del name, mod, Blob


then at the end of the shutdown phase, remaining is empty.
msg194043 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-08-01 10:27
On 01/08/2013 10:59am, Antoine Pitrou wrote:
> If you replace the end of your script with the following:
>
> for name, mod in sys.modules.items():
>      if name != 'encodings':
>          mod.__dict__["__blob__"] = Blob(name)
> del name, mod, Blob
>
>
> then at the end of the shutdown phase, remaining is empty.

On Windows, even with this change, I get for example:

   # remaining {'encodings.mbcs', '__main__', 'encodings.cp1252'}
   ...
   [22081 refs, 6742 blocks]

or

   # remaining {'__main__', 'encodings'}
   ...
   [23538 refs, 7136 blocks]
msg194044 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-01 10:40
You might want to open a prompt and look at gc.get_referrers() for encodings.mbcs.__dict__ (or another of those modules).
msg194045 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-08-01 11:08
> You might want to open a prompt and look at gc.get_referrers() for 
> encodings.mbcs.__dict__ (or another of those modules).

>>> gc.get_referrers(sys.modules['encodings.mbcs'].__dict__)
[<module 'encodings.mbcs' from 'C:\\Repos\\cpython-dirty\\lib\\encodings\\mbcs.py'>, <function decode at 0x01DEEF38>, <function getregentry at 0x01DFA038>, <function IncrementalEncoder.encode at 0x01DFA098>]

>>> gc.get_referrers(sys.modules['encodings.cp1252'].__dict__)
[<module 'encodings.cp1252' from 'C:\\Repos\\cpython-dirty\\lib\\encodings\\cp1252.py'>, <function getregentry at 0x02802578>, <function Codec.encode at 0x02802518>, <function Codec.decode at 0x028025D8>, <function IncrementalEncoder.encode at 0x02802638>, <function IncrementalDecoder.decode at 0x02802698>]

>>> gc.get_referrers(sys.modules['__main__'].__dict__)
[<function Blob.__init__ at 0x0057ABD8>, <function Blob.__del__ at 0x02AD36F8>,
<frame object at 0x027DFA80>, <function <listcomp> at 0x02AD3DB8>, <frame object at 0x02A38038>, <module '__main__' (<_frozen_importlib.SourceFileLoader object
at 0x0271EAB8>)>]
msg194047 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-08-01 11:17
> I get different numbers from you. If I run "./python -v -c pass", most 
> modules in the "wiping" phase are C extension modules, which is expected. 
> Pretty much every pure Python module ends up garbage collected before 
> that.

The *module* gets gc'ed, sure.  But you can't tell from "./python -v -c pass" when the *module dict* get gc'ed.

Using "./python -v check_purging.py", before the purging stage (# cleanup [3]) I only get

# purge/gc operator 54
# purge/gc io 53
# purge/gc keyword 52
# purge/gc types 51
# purge/gc sysconfig 50

That leaves lots of pure python module dicts to be purged later on.
msg194055 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-01 12:23
Here (Linux) I get the following:

# purge/gc os.path 12
# purge/gc operator 50
# purge/gc io 49
# purge/gc _sysconfigdata 48
# purge/gc sysconfig 47
# purge/gc keyword 46
# purge/gc site 45
# purge/gc types 44

Also, do note that purge/gc after wiping can still be a regular gc pass unless the module has been wiped. The gc could be triggered by another module being wiped.
msg194066 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-08-01 13:03
> Also, do note that purge/gc after wiping can still be a regular
> gc pass unless the module has been wiped. The gc could be triggered
> by another module being wiped.

For me, the modules which die naturally after purging begins are

# purge/gc encodings.aliases 34
# purge/gc _io 14
# purge/gc collections.abc 13
# purge/gc sre_compile 12
# purge/gc heapq 11
# purge/gc sre_constants 10
# purge/gc _weakrefset 9
# purge/gc reprlib 8
# purge/gc weakref 7
# purge/gc site 6
# purge/gc abc 5
# purge/gc encodings.latin_1 4
# purge/gc encodings.utf_8 3
# purge/gc genericpath 2

Of these, all but the first appear to happen during the final cyclic 
garbage collection.
msg194069 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-01 13:12
> Also, do note that purge/gc after wiping can still be a regular gc
> pass unless the module has been wiped. The gc could be triggered by
> another module being wiped.

That said, I welcome any suggestions to improve things. The ultimate
reasons we need to purge some modules are the same two reasons I
indicated on python-dev: C extension modules are almost immortal;
and some C code keeps references alive too long.

Do you agree that this patch is ok and we should address those two
problems in separate new issues?
msg194076 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-08-01 14:14
Yes, I agree the patch is ok.

It would be would be much simpler to keep track of the module dicts if 
they were weakrefable.  Alternatively, at shutdown a weakrefable object 
with a reference to the module dict could be inserted in to each module 
dict.  We could then use those to find orphaned module dicts.  But I 
doubt it is worth the extra effort.
msg194079 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-01 15:14
Ok, let's attack the rest separately then :)
And thanks a lot for testing!
msg194111 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-01 20:12
By the way, you may be interested to learn that the patch in issue10241 has made things quite a bit better now: C extension modules can be collected much earlier.
History
Date User Action Args
2013-08-21 06:13:40pitroulinkissue812369 superseder
2013-08-01 20:12:44pitrousetmessages: + msg194111
2013-08-01 15:14:06pitrousetstatus: open -> closed

messages: + msg194079
2013-08-01 14:14:55sbtsetmessages: + msg194076
2013-08-01 13:12:15pitrousetmessages: + msg194069
2013-08-01 13:03:46sbtsetmessages: + msg194066
2013-08-01 12:23:41pitrousetmessages: + msg194055
2013-08-01 11:17:24sbtsetmessages: + msg194047
2013-08-01 11:08:35sbtsetmessages: + msg194045
2013-08-01 10:40:39pitrousetmessages: + msg194044
2013-08-01 10:27:14sbtsetmessages: + msg194043
2013-08-01 09:59:25pitrousetmessages: + msg194042
2013-08-01 09:56:34christian.heimessetnosy: + christian.heimes
2013-08-01 09:49:15pitrousetmessages: + msg194040
2013-07-31 22:39:26sbtsetfiles: + check_purging.py

messages: + msg194026
2013-07-31 21:16:25pitrousetresolution: fixed
messages: + msg194021
stage: patch review -> resolved
2013-07-31 21:15:45python-devsetnosy: + python-dev
messages: + msg194020
2013-07-31 20:17:28pitrousetfiles: + module_cleanup5.patch

messages: + msg194015
2013-07-31 19:24:21Arfreversetnosy: + Arfrever
2013-07-31 09:36:01pitroulinkissue7140 superseder
2013-07-31 09:35:14pitrousetfiles: + module_cleanup4.patch

messages: + msg193991
2013-07-31 06:44:39pitrousetmessages: + msg193976
2013-07-30 22:03:42pitrousetfiles: + module_cleanup3.patch
2013-07-30 21:36:30pitrousetfiles: + module_cleanup2.patch

messages: + msg193957
2013-07-30 18:30:34pitrousetnosy: + gregory.p.smith, amaury.forgeotdarc

type: enhancement
components: + Interpreter Core
stage: patch review
2013-07-30 18:26:14pitrousetfiles: + module_cleanup.patch

messages: + msg193945
2013-06-15 19:44:33pitrousetmessages: + msg191230
2013-06-15 19:43:07sbtsetmessages: + msg191229
2013-06-15 18:11:39pitrousetmessages: + msg191217
2013-06-15 17:08:30sbtsetnosy: + pitrou
2013-06-14 14:59:39sbtcreate