classification
Title: broken pyc files
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: arigo, exarkun, loewis, pitrou, zseil
Priority: normal Keywords: patch

Created on 2005-04-10 13:10 by arigo, last changed 2009-01-06 19:17 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
update_co_filename.diff zseil, 2007-04-24 10:01 patch against trunk revision 54933
update_co_filename.diff exarkun, 2009-01-05 17:11
update_co_filename.diff exarkun, 2009-01-05 17:49
Messages (13)
msg24985 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-04-10 13:10
In a number of situations, the .pyc files can become "corrupted" in a subtle way: the co_filename attribute of the code objects it contains become wrong.  This can occur if we move or rename directories, or if we access the same set of files from two different locations (e.g. over NFS).

This corruption doesn't prevent the .pyc files from working, but the interpreter looses the reference to the source file.  It causes trouble in tracebacks, in the inspect module, etc.

A simple fix would be to use the following logic when importing a .py file: if there is a corresponding .pyc file, in addition to checking the timestamp, check the co_filename attribute of the loaded object.  If it doesn't point to the original .py file, discard the code object and ignore the .pyc file.

Alternatively, we could force all co_filenames to point to the .py file when loading the .pyc file.

I'll write a patch for whichever alternative seems better.
msg24986 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-03-28 12:01
I fail to see the corruption. It is quite desirable and normal to only ship pyc files - that the file name they refer to is actually present is not a requirement at all.
msg24987 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2007-03-28 13:40
What I called "corruption" is the situation
where both the .py and the .pyc files are
present, but the filename stored in the .pyc
co_filenames is no longer the valid absolute
path of the corresponding .py file, for any
reason (renaming, NFS views, etc.).

This situation causes the tracebacks and the
inspect module to fail to locate the .py file,
which I consider a bug.
msg24988 - (view) Author: Ziga Seilnacht (zseil) * (Python committer) Date: 2007-04-03 07:16
This problem is reported quite often in the tracker,
although it shows up in different places:

http://www.python.org/sf/1666807
http://www.python.org/sf/1051638

I closed those bugs as duplicates of this one.

The logging package is also affected:

http://www.python.org/sf/1669498
http://www.python.org/sf/1633605
http://www.python.org/sf/1616422
msg24989 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2007-04-03 11:31
If you ask me, I think that when the importing
system finds both a .py and a .pyc for a module,
then it should ignore all co_filename and replace
them with the real path of the .py file.  I can't
see any point of not doing so.

There are many other quirks caused by .pyc files
accidentally remaining around, but we cannot fix them
all as long as the .pyc files are at the same time
a cache for performance reason and a redistributable
program format (e.g. if "rm x.py" or "svn up" deletes
a .py file, then the module is still importable via
the .pyc left behind, a great way to oversee the fact
that imports elsewhere in the project need to be
updated).
msg24990 - (view) Author: Ziga Seilnacht (zseil) * (Python committer) Date: 2007-04-03 13:46
Wouldn't your first solution be simpler? Changing all
co_filenames would require either changing various
marhal.c functions, or traversing the code object
returned by import.c/read_compiled_module().

Discarding the compiled code when the file names don't
match would be simpler and only require minor changes
in import.c/load_source_module().
msg24991 - (view) Author: Ziga Seilnacht (zseil) * (Python committer) Date: 2007-04-24 10:01
Here is a patch that implements arigo's last suggestion.

File Added: update_co_filename.diff
msg24992 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2007-05-02 19:42
It's an obscure detail, but I think that the
.pyc file should not be rewritten again after we
fix the co_filenames.  Fixing the co_filenames
is a very very cheap operation, and I can imagine
cases where the same .py files are accessed from
what appears to be two different paths, e.g. over
NFS - this would cause .pyc files to be rewritten
all the time, which is particularly bad if we
have the example of NFS in mind.  Not to mention
that two python processes trying to write
*different* data to the same .pyc file at the
same time are going to create a mess, ending in
a segfault the next time the broken .pyc is
loaded.

It's overall a mess, so let's play it safe.
msg61587 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-01-23 16:04
If code objects grew a __module__ attribute (which functions already
have), wouldn't it be just a matter of falling back on 
sys.modules[my_code_object.__module__].__file__ when
my_code_object.co_filename points to a non-existent file?
msg79167 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2009-01-05 17:11
This is causing problems for me as well.  The attached patch no longer
applies cleanly to trunk.  I've attached an updated version which
addresses the conflicts.  The new behavior fixes the issues I have with
the current behavior.  It'd be great to have it applied.

> If code objects grew a __module__ attribute (which functions already
> have), wouldn't it be just a matter of falling back on 
> sys.modules[my_code_object.__module__].__file__ when
> my_code_object.co_filename points to a non-existent file?

It'd be nice if it wasn't necessary to check to see if co_filename
referred to an existing file.  Can we have a solution which creates one
definitive, correct way to determine the source file?
msg79173 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-05 17:41
As Armin said, I think it's safer and simpler not to rewrite the pyc
file when the filenames have been changed.
(if you thing changing the filenames can have a significant performance
impact, you may want to benchmark it)
msg79175 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2009-01-05 17:49
New version of the patch which doesn't rewrite pyc files attached.
msg79279 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-06 19:17
Committed to trunk and py3k, and backported to 2.6 and 3.0. Thanks!
History
Date User Action Args
2009-01-06 19:17:28pitrousetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg79279
2009-01-06 17:59:48pitrousetresolution: accepted
versions: + Python 3.1, Python 2.7, - Python 2.6
2009-01-05 17:49:16exarkunsetfiles: + update_co_filename.diff
messages: + msg79175
2009-01-05 17:41:56pitrousetmessages: + msg79173
2009-01-05 17:11:10exarkunsetfiles: + update_co_filename.diff
nosy: + exarkun
messages: + msg79167
keywords: + patch
2009-01-05 16:55:35amaury.forgeotdarclinkissue4845 superseder
2008-01-23 16:04:20pitrousetnosy: + pitrou
messages: + msg61587
2008-01-05 20:17:24christian.heimessettype: enhancement
versions: + Python 2.6
2005-04-10 13:10:52arigocreate