msg183270 - (view) |
Author: Erik Bray (erik.bray) * |
Date: 2013-03-01 18:38 |
This is vaguely related to issue14067, though the patch suggested there would make this problem worse, not better.
This addresses a problem that cropped up on OSX in which some code that, for Good Reasons, generates a module in a package directory and then expects to be able to import it. Because HFS+ does not offer sub-second resolution for mtime the generated module is not found. We didn't have this problem on Linux or Windows.
I've attached one possible solution to the issue, which uses a tuple of st_mtime and st_nlink to determine if the cache should be invalidated. This should pick up most file creation/deletion on such file systems even if the directory's mtime hasn't changed. This doesn't add any additional system calls so it shouldn't have any significant performance impact.
In the meantime importlib.invalidate_caches() offers a consistent workaround, but it was surprising when this issue was found to be platform dependent.
|
msg183273 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2013-03-01 18:56 |
[for anyone else who doesn't know what st_nlink is for: http://docs.python.org/3/library/stat.html#stat.ST_NLINK]
Since it doesn't add stat overhead as it's just part of what os.path.stats() returns I'm fine with adding it to part of the invalidation scheme. What do you think, Antoine?
|
msg183276 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-03-01 19:06 |
Well, importlib.invalidate_caches() is not a workaround, it's really the right way to solve your issue (for example if instead of writing a new module, you modify an existing one: st_nlink will then remain the same).
|
msg183281 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2013-03-01 19:33 |
Antoine does make a good point: adding st_nlink could lead to some hard-to-discover bugs because something will work the first time when the file is added but then subsequently fail if you are modifying the new file in-place.
I'm going to go with Antoine's thinking and close this bug as "won't fix". Thanks for at least taking a stab at the code, Erik.
|
msg183284 - (view) |
Author: Erik Bray (erik.bray) * |
Date: 2013-03-01 19:51 |
Why should modifying the file in place be expected to do anything with respect to the directory cache? If that module has already been imported then modifying it should not require the cache to be invalidated.
If the file is modified *before* it's imported that's still irrelevant because that means the file exists in the directory and will be included in the cache when it gets generated.
Relying on invalidate_caches() is a workaround because it's not even necessary on better filesystems--the existing behavior already leads to hard to find bugs.
|
msg183285 - (view) |
Author: Erik Bray (erik.bray) * |
Date: 2013-03-01 19:57 |
Put another way, the cache associated with a FileFinder only keeps track of the filenames in a directory, and not their individual mtimes. So if a new file is added to the directory the cache should be invalided. Likewise if a file is removed.
If a file is modified or removed and a new file with the same name subsequently added this is irrelevant to the cache. Importing that module will still succeed if it hasn't already been imported. If that module's file is *not* in the cache then an ImportError is raised.
|
msg183286 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-03-01 20:08 |
> Put another way, the cache associated with a FileFinder only keeps
> track of the filenames in a directory, and not their individual
> mtimes. So if a new file is added to the directory the cache should
> be invalided. Likewise if a file is removed.
True, but if a file is removed and another added, st_nlink won't change.
Which will lead to even harder to find bugs.
I agree it would be better if it were always possible to detect
directory modifications, without any false negatives. Sadly it is not
(blame POSIX, I guess, although Windows is not better here), hence
invalidate_caches().
|
msg183287 - (view) |
Author: Erik Bray (erik.bray) * |
Date: 2013-03-01 20:17 |
Fair enough, but then I would propose removing the automatic cache invalidation based on mtime altogether. It's just not reliable enough--this caused previously working code to break unexpectedly, and only on a single platform at that.
I'm fine with "importlib.invalidate_caches()" being the correct solution--but then that should be consistent and enforced, and not just something you may have to do sometimes in some cases but that you don't really need in other cases.
|
msg183288 - (view) |
Author: Erik Bray (erik.bray) * |
Date: 2013-03-01 20:38 |
I should add, Antoine's counter-example of a file that's renamed or replaced already doesn't work either under OSX. So while this solution won't address that (even more narrow) use case, it would at least improve the existing situation for what I would like to think is a much more common use case.
|
msg183290 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-03-01 20:40 |
> I should add, Antoine's counter-example of a file that's renamed or
> replaced already doesn't work either under OSX. So while this
> solution won't address that (even more narrow) use case, it would at
> least improve the existing situation for what I would like to think is
> a much more common use case.
Yes, it seems all possibilities have mixed outcomes. I'll let Brett
decide :-)
|
msg183297 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2013-03-01 22:00 |
So let's see what options we have on the table.
Status quo: picks up mutated modules as long as the mtime is granular enough. Fails in the face of added fails under the same conditions. Need to call importlib.invalidate_caches() to avoid this problem.
Add st_nlinks: Would help in the situation of an added/removed file. Doesn't fix mtime granularity problem. Still might need to call importlib.invalidate_caches() if doing more than adding files.
Drop implicit cache validation entirely: Force people to call importlib.invalidate_caches() no matter what (so no more accidental "it worked on my machine but not that other one"). Cuts down on stat call overhead by 1 per file-based import (which benchmarking suggests doesn't get us anything).
I'm leaning towards just dropping the implicit cache invalidation and forcing people to call importlib.invalidate_caches() when they muck with the underlying modules on sys.path/__path__. This probably would have helped when we started to run into similar problems in the stdlib's test suite when this whole caching mechanism was added. Since this lowers the chance of mistakenly thinking code will work when deployed on other filesystems and speed things up in the majority of cases where people are not dynamically generating files in sys.path.
I don't know if backporting is a good idea, though, as it might catch people by surprise who are not trying to run on different filesystems in 3.3.1. Then again, being lucky in not needing to call importlib.invalidate_caches() doesn't mean that starting to call it properly is a bad thing in a bugfix release.
Re-opening the bug under a different name while we think about this.
|
msg183305 - (view) |
Author: Eric Snow (eric.snow) * |
Date: 2013-03-02 00:21 |
Yanking the implicit cache validation does seem to be the simplest solution here. Perhaps we could come up with a complex way of making this work, but I don't think it's worth it.
What may be worth it is making the admonition of "use invalidate_caches()" easier to find. I imagine the situation like this:
1. dynamically generate module
2. try to import it and get ImportError
3. scratch head and make sure it's actually valid code, reboot, etc.
4. still not working
5. search google or Python docs for an answer
and step 5 isn't working out well enough. Removing the implicit cache invalidation would only make a more reliable step 5 even more important.
How to make that happen? I'm not sure. Perhaps a dedicated "Trouble-shooting Imports" FAQ with a links to it on relevant pages like importlib, imp, and the import statement. Maybe we already have something like that and I've just missed it.
|
msg183307 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-03-02 00:23 |
I would suggest checking on python-dev before removing the cache
freshness check. Some people may rely on it.
|
msg183309 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2013-03-02 02:31 |
Fine, I asked python-dev, but I said it's going to take a lot to convince me not to rip out the cache invalidation code.
|
msg183310 - (view) |
Author: Erik Bray (erik.bray) * |
Date: 2013-03-02 02:42 |
Dropping the implicit invalidation altogether works for me. I was hoping to find a better solution because for the majority of use cases I personally care about the existing behavior is preferable.
That said I have to support OSX users as possibly the majority of my userbase, and I think that for anyone it's a big enough exception that it can't be ignored.
That being the case, explicit is better than implicit here. I didn't have too much trouble finding importlib.invalidate_caches() once I realized what the problem was, but the fact that this change needs to be made could be better advertised I suppose.
Thanks.
|
msg183351 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2013-03-02 23:40 |
So Nick Coghlan pointed out that this would require a deprecation warning which is doable using ImportWarning. But that also lowers my motivation to bother since there is no performance benefit.
I will, though, at least make it clearer in the What's New doc and in the importlib docs that it is more of a need than an optional thing to do.
|
msg187954 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2013-04-28 03:24 |
http://hg.python.org/cpython/rev/75e32a0bfd74 and http://hg.python.org/cpython/rev/5fac0ac46f54 have more stronger wording for importlib.invalidate_caches(). What's New already says to call the function without being wishy-washy, so I didn't touch it.
|
msg187955 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2013-04-28 03:24 |
And I have decided to not bother removing the stat check.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:42 | admin | set | github: 61532 |
2013-04-28 03:24:48 | brett.cannon | set | messages:
+ msg187955 |
2013-04-28 03:24:16 | brett.cannon | set | status: open -> closed resolution: fixed messages:
+ msg187954
|
2013-03-10 20:49:44 | asvetlov | set | nosy:
+ asvetlov
|
2013-03-04 19:18:47 | barry | set | nosy:
+ barry
|
2013-03-02 23:40:00 | brett.cannon | set | assignee: brett.cannon
messages:
+ msg183351 nosy:
+ ncoghlan |
2013-03-02 02:42:08 | erik.bray | set | messages:
+ msg183310 |
2013-03-02 02:31:30 | brett.cannon | set | messages:
+ msg183309 |
2013-03-02 00:23:25 | pitrou | set | messages:
+ msg183307 |
2013-03-02 00:21:44 | eric.snow | set | nosy:
+ eric.snow messages:
+ msg183305
|
2013-03-01 22:00:13 | brett.cannon | set | status: closed -> open files:
+ drop_implicit_cache_invalidation.diff
components:
+ Interpreter Core, - None title: Check st_nlink in addition to st_mtime to invalidate FileFinder cache -> Stop checking for directory cache invalidation in importlib resolution: wont fix -> (no value) versions:
+ Python 3.4, - Python 3.3 messages:
+ msg183297 stage: needs patch |
2013-03-01 20:40:39 | pitrou | set | messages:
+ msg183290 |
2013-03-01 20:38:47 | erik.bray | set | messages:
+ msg183288 |
2013-03-01 20:17:26 | erik.bray | set | messages:
+ msg183287 |
2013-03-01 20:08:36 | pitrou | set | messages:
+ msg183286 |
2013-03-01 19:57:03 | erik.bray | set | messages:
+ msg183285 |
2013-03-01 19:51:26 | erik.bray | set | messages:
+ msg183284 |
2013-03-01 19:33:51 | brett.cannon | set | status: open -> closed resolution: wont fix messages:
+ msg183281
|
2013-03-01 19:06:38 | pitrou | set | messages:
+ msg183276 |
2013-03-01 18:56:42 | brett.cannon | set | nosy:
+ brett.cannon, pitrou messages:
+ msg183273
|
2013-03-01 18:38:34 | erik.bray | set | files:
+ 94e671589c0e.diff keywords:
+ patch |
2013-03-01 18:38:11 | erik.bray | create | |