classification
Title: Stop checking for directory cache invalidation in importlib
Type: behavior Stage: needs patch
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: brett.cannon Nosy List: asvetlov, barry, brett.cannon, eric.snow, erik.bray, ncoghlan, pitrou
Priority: normal Keywords: patch

Created on 2013-03-01 18:38 by erik.bray, last changed 2013-04-28 03:24 by brett.cannon. This issue is now closed.

Files
File name Uploaded Description Edit
94e671589c0e.diff erik.bray, 2013-03-01 18:38 review
drop_implicit_cache_invalidation.diff brett.cannon, 2013-03-01 22:00 review
Repositories containing patches
https://bitbucket.org/embray/cpython#import-cache-fix
Messages (18)
msg183270 - (view) Author: Erik Bray (erik.bray) * Date: 2013-03-01 18:38
This is vaguely related to issue14067, though the patch suggested there would make this problem worse, not better.

This addresses a problem that cropped up on OSX in which some code that, for Good Reasons, generates a module in a package directory and then expects to be able to import it.  Because HFS+ does not offer sub-second resolution for mtime the generated module is not found. We didn't have this problem on Linux or Windows.

I've attached one possible solution to the issue, which uses a tuple of st_mtime and st_nlink to determine if the cache should be invalidated.  This should pick up most file creation/deletion on such file systems even if the directory's mtime hasn't changed.  This doesn't add any additional system calls so it shouldn't have any significant performance impact.

In the meantime importlib.invalidate_caches() offers a consistent workaround, but it was surprising when this issue was found to be platform dependent.
msg183273 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-03-01 18:56
[for anyone else who doesn't know what st_nlink is for: http://docs.python.org/3/library/stat.html#stat.ST_NLINK]

Since it doesn't add stat overhead as it's just part of what os.path.stats() returns I'm fine with adding it to part of the invalidation scheme. What do you think, Antoine?
msg183276 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-01 19:06
Well, importlib.invalidate_caches() is not a workaround, it's really the right way to solve your issue (for example if instead of writing a new module, you modify an existing one: st_nlink will then remain the same).
msg183281 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-03-01 19:33
Antoine does make a good point: adding st_nlink could lead to some hard-to-discover bugs because something will work the first time when the file is added but then subsequently fail if you are modifying the new file in-place.

I'm going to go with Antoine's thinking and close this bug as "won't fix". Thanks for at least taking a stab at the code, Erik.
msg183284 - (view) Author: Erik Bray (erik.bray) * Date: 2013-03-01 19:51
Why should modifying the file in place be expected to do anything with respect to the directory cache?  If that module has already been imported then modifying it should not require the cache to be invalidated.

If the file is modified *before* it's imported that's still irrelevant because that means the file exists in the directory and will be included in the cache when it gets generated.

Relying on invalidate_caches() is a workaround because it's not even necessary on better filesystems--the existing behavior already leads to hard to find bugs.
msg183285 - (view) Author: Erik Bray (erik.bray) * Date: 2013-03-01 19:57
Put another way, the cache associated with a FileFinder only keeps track of the filenames in a directory, and not their individual mtimes.  So if a new file is added to the directory the cache should be invalided.  Likewise if a file is removed.

If a file is modified or removed and a new file with the same name subsequently added this is irrelevant to the cache.  Importing that module will still succeed if it hasn't already been imported.  If that module's file is *not* in the cache then an ImportError is raised.
msg183286 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-01 20:08
> Put another way, the cache associated with a FileFinder only keeps
> track of the filenames in a directory, and not their individual
> mtimes.  So if a new file is added to the directory the cache should
> be invalided.  Likewise if a file is removed.

True, but if a file is removed and another added, st_nlink won't change.
Which will lead to even harder to find bugs.

I agree it would be better if it were always possible to detect
directory modifications, without any false negatives. Sadly it is not
(blame POSIX, I guess, although Windows is not better here), hence
invalidate_caches().
msg183287 - (view) Author: Erik Bray (erik.bray) * Date: 2013-03-01 20:17
Fair enough, but then I would propose removing the automatic cache invalidation based on mtime altogether.  It's just not reliable enough--this caused previously working code to break unexpectedly, and only on a single platform at that.

I'm fine with "importlib.invalidate_caches()" being the correct solution--but then that should be consistent and enforced, and not just something you may have to do sometimes in some cases but that you don't really need in other cases.
msg183288 - (view) Author: Erik Bray (erik.bray) * Date: 2013-03-01 20:38
I should add, Antoine's counter-example of a file that's renamed or replaced already doesn't work either under OSX.  So while this solution won't address that (even more narrow) use case, it would at least improve the existing situation for what I would like to think is a much more common use case.
msg183290 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-01 20:40
> I should add, Antoine's counter-example of a file that's renamed or
> replaced already doesn't work either under OSX.  So while this
> solution won't address that (even more narrow) use case, it would at
> least improve the existing situation for what I would like to think is
> a much more common use case.

Yes, it seems all possibilities have mixed outcomes. I'll let Brett
decide :-)
msg183297 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-03-01 22:00
So let's see what options we have on the table.

Status quo: picks up mutated modules as long as the mtime is granular enough. Fails in the face of added fails under the same conditions. Need to call importlib.invalidate_caches() to avoid this problem.

Add st_nlinks: Would help in the situation of an added/removed file. Doesn't fix mtime granularity problem. Still might need to call importlib.invalidate_caches() if doing more than adding files.

Drop implicit cache validation entirely: Force people to call importlib.invalidate_caches() no matter what (so no more accidental "it worked on my machine but not that other one"). Cuts down on stat call overhead by 1 per file-based import (which benchmarking suggests doesn't get us anything).

I'm leaning towards just dropping the implicit cache invalidation and forcing people to call importlib.invalidate_caches() when they muck with the underlying modules on sys.path/__path__. This probably would have helped when we started to run into similar problems in the stdlib's test suite when this whole caching mechanism was added. Since this lowers the chance of mistakenly thinking code will work when deployed on other filesystems and speed things up in the majority of cases where people are not dynamically generating files in sys.path.

I don't know if backporting is a good idea, though, as it might catch people by surprise who are not trying to run on different filesystems in 3.3.1. Then again, being lucky in not needing to call importlib.invalidate_caches() doesn't mean that starting to call it properly is a bad thing in a bugfix release.

Re-opening the bug under a different name while we think about this.
msg183305 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2013-03-02 00:21
Yanking the implicit cache validation does seem to be the simplest solution here.  Perhaps we could come up with a complex way of making this work, but I don't think it's worth it.

What may be worth it is making the admonition of "use invalidate_caches()" easier to find.  I imagine the situation like this:

1. dynamically generate module
2. try to import it and get ImportError
3. scratch head and make sure it's actually valid code, reboot, etc.
4. still not working
5. search google or Python docs for an answer

and step 5 isn't working out well enough.  Removing the implicit cache invalidation would only make a more reliable step 5 even more important.

How to make that happen?  I'm not sure.  Perhaps a dedicated "Trouble-shooting Imports" FAQ with a links to it on relevant pages like importlib, imp, and the import statement.  Maybe we already have something like that and I've just missed it.
msg183307 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-02 00:23
I would suggest checking on python-dev before removing the cache
freshness check. Some people may rely on it.
msg183309 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-03-02 02:31
Fine, I asked python-dev, but I said it's going to take a lot to convince me not to rip out the cache invalidation code.
msg183310 - (view) Author: Erik Bray (erik.bray) * Date: 2013-03-02 02:42
Dropping the implicit invalidation altogether works for me.  I was hoping to find a better solution because for the majority of use cases I personally care about the existing behavior is preferable.

That said I have to support OSX users as possibly the majority of my userbase, and I think that for anyone it's a big enough exception that it can't be ignored.

That being the case, explicit is better than implicit here.  I didn't have too much trouble finding importlib.invalidate_caches() once I realized what the problem was, but the fact that this change needs to be made could be better advertised I suppose.

Thanks.
msg183351 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-03-02 23:40
So Nick Coghlan pointed out that this would require a deprecation warning which is doable using ImportWarning. But that also lowers my motivation to bother since there is no performance benefit.

I will, though, at least make it clearer in the What's New doc and in the importlib docs that it is more of a need than an optional thing to do.
msg187954 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-04-28 03:24
http://hg.python.org/cpython/rev/75e32a0bfd74 and http://hg.python.org/cpython/rev/5fac0ac46f54 have more stronger wording for importlib.invalidate_caches(). What's New already says to call the function without being wishy-washy, so I didn't touch it.
msg187955 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-04-28 03:24
And I have decided to not bother removing the stat check.
History
Date User Action Args
2013-04-28 03:24:48brett.cannonsetmessages: + msg187955
2013-04-28 03:24:16brett.cannonsetstatus: open -> closed
resolution: fixed
messages: + msg187954
2013-03-10 20:49:44asvetlovsetnosy: + asvetlov
2013-03-04 19:18:47barrysetnosy: + barry
2013-03-02 23:40:00brett.cannonsetassignee: brett.cannon

messages: + msg183351
nosy: + ncoghlan
2013-03-02 02:42:08erik.braysetmessages: + msg183310
2013-03-02 02:31:30brett.cannonsetmessages: + msg183309
2013-03-02 00:23:25pitrousetmessages: + msg183307
2013-03-02 00:21:44eric.snowsetnosy: + eric.snow
messages: + msg183305
2013-03-01 22:00:13brett.cannonsetstatus: closed -> open
files: + drop_implicit_cache_invalidation.diff

components: + Interpreter Core, - None
title: Check st_nlink in addition to st_mtime to invalidate FileFinder cache -> Stop checking for directory cache invalidation in importlib
resolution: wont fix -> (no value)
versions: + Python 3.4, - Python 3.3
messages: + msg183297
stage: needs patch
2013-03-01 20:40:39pitrousetmessages: + msg183290
2013-03-01 20:38:47erik.braysetmessages: + msg183288
2013-03-01 20:17:26erik.braysetmessages: + msg183287
2013-03-01 20:08:36pitrousetmessages: + msg183286
2013-03-01 19:57:03erik.braysetmessages: + msg183285
2013-03-01 19:51:26erik.braysetmessages: + msg183284
2013-03-01 19:33:51brett.cannonsetstatus: open -> closed
resolution: wont fix
messages: + msg183281
2013-03-01 19:06:38pitrousetmessages: + msg183276
2013-03-01 18:56:42brett.cannonsetnosy: + brett.cannon, pitrou
messages: + msg183273
2013-03-01 18:38:34erik.braysetfiles: + 94e671589c0e.diff
keywords: + patch
2013-03-01 18:38:11erik.braycreate