classification
Title: Establish a uniform way to clear all caches in a given module
Type: enhancement Stage: patch review
Components: Library (Lib), Tests Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Anders.Hovmöller, Ma Lin, brett.cannon, ezio.melotti, michael.foord, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2019-03-30 11:56 by serhiy.storchaka, last changed 2019-04-02 20:34 by rhettinger.

Pull Requests
URL Status Linked Edit
PR 12632 open serhiy.storchaka, 2019-03-30 12:02
PR 12639 open serhiy.storchaka, 2019-03-31 09:11
Messages (14)
msg339190 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-03-30 11:56
Some modules have caches. There is a need to clear all tests before running tests. Brett proposed to add in all modules with caches a function with the standardized name which is responsible for clearing module related caches. [1]

The proposed PR adds a new function clear_caches() in the sys module. It iterates all imported modules and calls function __clearcache__() if it is defined.

    def clear_caches():
        for mod in reversed(list(sys.modules.values())):
            if hasattr(mod, '__clearcache__'):
                mod.__clearcache__()

clear_caches() will be used in test.regrtest and can be used in user code. The PR defines also function __clearcache__ for modules which are cleared manually in the current code.

This is a preliminary implementation, bikeshedding is welcome.

[1] https://mail.python.org/pipermail/python-ideas/2019-March/056165.html
msg339227 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-03-30 22:47
Not sure that I agree there is a testing need to clear all caches regardless of what they do. Test code should explicitly state whether it relies on a particular cache being cleared at some particular point in time.

Also, the concept of "need to clear all caches" isn't well-formed.  Would you want sys.intern caches to be cleared? What about the internal caches in SQLite? 

And do you think polling for a new magic attribute is the right approach?  We would get looser coupling and better control by letting modules register themselves as needed.

--- re.py ---

sys.register_cache_clear_function(callback=purge, doc='pattern cache and re cache')

--- ipaddress.py --

sys.register(IPv4Address.is_private.is_getter.cache_clear, doc='check for private networks)
msg339232 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2019-03-30 23:48
An auto-magic cache clearing mechanism is really tempting. I tend to agree with Raymond though, if code needs and progress a cache clearing mechanism it should be treated and accessible. 

They're are probably some problematic caches still within unittest however. Do test results still keep alive all tracebacks until test reporting?
msg339234 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2019-03-31 00:14
> On 30 Mar 2019, at 23:48, Michael Foord <report@bugs.python.org> wrote:
> 
> 
> Michael Foord <michael@voidspace.org.uk> added the comment:
> 
> An auto-magic cache clearing mechanism is really tempting. I tend to agree with Raymond though, if code needs and progress a cache clearing mechanism it should be treated and accessible. 

* exposes (not progress)
* tested  (not treated)

Sorry. 
> 
> They're are probably some problematic caches still within unittest however. Do test results still keep alive all tracebacks until test reporting?
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue36485>
> _______________________________________
msg339237 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2019-03-31 01:15
Tests codify knowledge about the system under test, so it doesn't matter that the test suite has to know how to clear caches. It's specifically a good thing that the test writer knows which caches exist and need clearing, and how to do it. The harder thing mighty be determining what scope to do the clearing (per test, class or module) bit unittest exposes hooks for fixtures at those points for anything that needs doing automatically.
msg339250 - (view) Author: Ma Lin (Ma Lin) * Date: 2019-03-31 08:30
I suggest the documentation be written in more detail.

For example, in __clearcache__'s section, state explicitly that this magic function is for module-level cache, and it will be invoked by sys.clear_caches().

Maybe also introduce the background: some caches may grow unlimitedly, sys.clear_caches() gives the user a chance to empty them.
msg339251 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-03-31 09:11
My initial idea was to add a lightweight module cachesreg with two functions: register() and clear_caches(). PR 12639 implements it.

But I like Brett's idea more, because it is simpler. The only disadvantage of it is that if you make a typo in __clearcache__, this function will be silently ignored.

I thought also about different levels of cachesreg.cachesreg.register() could take two arguments -- the level and the clearing function. Then cachesreg.clear_caches() could allow to clear only caches of specified level and smaller/larger.

Both PRs add clearing callbacks only for modules which already cleared in regrtests. There are more caches, and with implementing any of these ideas it will be easier to add clearing caches in other modules.
msg339254 - (view) Author: Anders Hovmöller (Anders.Hovmöller) * Date: 2019-03-31 12:11
I think this is a great idea. We would have needed this many times for tests over the years.
msg339255 - (view) Author: Ma Lin (Ma Lin) * Date: 2019-03-31 12:39
> My initial idea was to add a lightweight module cachesreg with two functions: register() and clear_caches().

If it only has two functions, it could be a sub-module sys.cachesreg

Or a lifecycle module, as the name, dedicated to such kind of functions. Register callback functions for memory low, poweroff system, etc.
I don't want lifecycle module, just provide a possibility.
msg339301 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-04-01 18:17
RE: "And do you think polling for a new magic attribute is the right approach?": my thinking behind that idea is that by standardizing the function name it's easy to tell if there's a cache, but you can also do away with the registration with a 3 lines of code. To me, the priority is clearing caches on a per-module basici and having a clear-all mechanism can be beneficial, not the other way around.
msg339311 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-04-02 01:01
[Brett]
> To me, the priority is clearing caches on a per-module basici
> and having a clear-all mechanism can be beneficial, not the
> other way around.

That makes more sense.

I'm changing the title to match the actual feature request and intent:

"Add a way to clear all caches" -> "Establish a uniform way to clear all caches in a given module"
msg339312 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-04-02 01:05
Quick question, would the existing sys.reload() logic suffice?

-- mymodule.py --

cache = {}                  # On reload, this would clear the cache

def f(x):
    if x in cache:
        return cache[x]
    y = x**2
    cache[x] = y
    return y
msg339355 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-04-02 18:04
Did you mean importlib.reload() instead of sys.reload()?

And technically it would *if* you're okay with the other side-effects of reloading, e.g. making sure no one has a reference to any objects from the module's namespace which won't change in-place (e.g. if you stored a reference to the cache in some code then the reload wouldn't clear it for the stored reference).
msg339366 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-04-02 20:34
> Did you mean importlib.reload() instead of sys.reload()?

Yes.

> And technically it would *if* you're okay with the other 
> side-effects of reloading,

If you want to go forward with this, go for it. I would like to be able to explain to another person why this is needed, but personally can't visualize a circumstance where a person is testing module, doesn't know how to use the existing cache clearing APIs, but needs to clear caches (not sure why), and doesn't either know or want what happens on import.  I've never seen this situation arise, but if it's something you want, I won't stand it the way.

BTW, if you're going to have some sort of clear_all(), perhaps it should cover the sys.intern() dictionary and string hashes as well.  AFAICT, there's nothing special about a regex cache that gives it a greater need to be cleared
History
Date User Action Args
2019-04-02 20:34:12rhettingersetmessages: + msg339366
2019-04-02 18:04:36brett.cannonsetmessages: + msg339355
2019-04-02 01:05:33rhettingersetmessages: + msg339312
2019-04-02 01:01:04rhettingersetmessages: + msg339311
title: Add a way to clear all caches -> Establish a uniform way to clear all caches in a given module
2019-04-01 18:17:56brett.cannonsetmessages: + msg339301
2019-03-31 12:39:05Ma Linsetmessages: + msg339255
2019-03-31 12:11:33Anders.Hovmöllersetnosy: + Anders.Hovmöller
messages: + msg339254
2019-03-31 09:11:48serhiy.storchakasetpull_requests: + pull_request12571
2019-03-31 09:11:32serhiy.storchakasetmessages: + msg339251
2019-03-31 08:30:59Ma Linsetnosy: + Ma Lin
messages: + msg339250
2019-03-31 01:15:02michael.foordsetmessages: + msg339237
2019-03-31 00:14:58michael.foordsetmessages: + msg339234
2019-03-30 23:48:58michael.foordsetmessages: + msg339232
2019-03-30 22:47:06rhettingersetnosy: + rhettinger
messages: + msg339227
2019-03-30 12:02:44serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request12563
2019-03-30 11:56:59serhiy.storchakacreate