classification
Title: PyPycLoader can't read cached .pyc files
Type: behavior Stage: commit review
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: brett.cannon Nosy List: Arfrever, Marc.Abramowitz, Ronan.Lamy, brett.cannon, georg.brandl, jcea, loewis, pitrou
Priority: release blocker Keywords: patch

Created on 2012-06-07 18:29 by Ronan.Lamy, last changed 2012-07-03 20:54 by brett.cannon. This issue is now closed.

Files
File name Uploaded Description Edit
test_PyPyc.diff Ronan.Lamy, 2012-06-07 18:29
cpython-issue-15030.patch Marc.Abramowitz, 2012-06-27 05:36 review
cpython-issue-15030.patch Marc.Abramowitz, 2012-06-27 14:16 review
cpython-issue-15030.patch Marc.Abramowitz, 2012-06-27 14:30 review
cpython-issue-15030.patch Marc.Abramowitz, 2012-06-27 21:04 review
cpython-issue-15030.patch Marc.Abramowitz, 2012-06-27 21:07 review
cpython-issue-15030.patch Marc.Abramowitz, 2012-06-27 21:08 review
cpython-issue-15030.patch Marc.Abramowitz, 2012-06-28 15:40 review
Misc_ACKS.patch Marc.Abramowitz, 2012-07-02 20:55 review
Messages (29)
msg162486 - (view) Author: Ronan Lamy (Ronan.Lamy) * Date: 2012-06-07 18:29
PyPycLoader can't read or write .pyc files created by the core import machinery. I'm attaching a failing test case demonstrating the issue: if you import a .py file using standard mechanisms, thus creating a .pyc, and then (in a separate process, say) attempt to make use of that cached file with PyPycLoader, the import fails with 'ValueError: bad marshal data (unknown type code)'.

It looks like that there has been a change in the binary format of .pyc files but PyPycLoader wasn't updated.
msg162488 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-06-07 19:09
Ronan, can you confirm if 2.7 or 3.2 are affected too?
msg162491 - (view) Author: Ronan Lamy (Ronan.Lamy) * Date: 2012-06-07 19:39
2.7 doesn't have PyPycLoader. For 3.2, it's such a pain to create a working concrete subclass of PyPycLoader that I gave up. But just by reading the code, it doesn't seem affected, as there's no discrepancy with importlib._bootstrap: they both consider that metadata are the first 8 bytes (in default, _bootstrap switched to using 12 bytes).
msg162503 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-06-07 21:28
Could you write a patch for 3.3?
msg162512 - (view) Author: Ronan Lamy (Ronan.Lamy) * Date: 2012-06-08 01:03
My preferred solution would be to use to use the functions I describe in issue 15031.
msg162518 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-06-08 02:57
The format did change and importlib.abc.PyPycLoader was not subsequently updated. Problem is that the ABC has been deprecated and given backwards-compatibility instructions for Python 3.1, so I don't know if it should be considered a priority to fix this to read .pyc files created by other loaders instead of just the .pyc files it creates on its own when it implements write_bytecode().
msg162537 - (view) Author: Ronan Lamy (Ronan.Lamy) * Date: 2012-06-08 17:14
Well, working on a deprecated ABC is certainly low-priority, but it seems rather bad to have 2 incompatible implementations of the .pyc format in the stdlib. If this is to stay like this, it should at least come with a strong warning that users of PyPycLoader must invent their own PEP 3147 magic tag, which opens the can of worms of supporting multiple .pyc formats within the same interpreter in the stdlib. 

It seems easier to just fix it to be compatible with the one blessed .pyc format, as was the case in 3.2.
msg162544 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-06-08 18:17
Yes, a fix would be easiest, but someone has to write the fix. =) I will try to get to it before the first beta (unless Antoine, who made the .pyc change, wants to do it =).
msg164121 - (view) Author: Marc Abramowitz (Marc.Abramowitz) * Date: 2012-06-27 05:36
Attaching a patch...

Using Ronan's test_PyPyc.diff, before my patch:

{{{
~/dev/hg-repos/cpython$ ./python.exe -m unittest Lib/importlib/test/source/test_abc_loader.py 
...........................................E..................
======================================================================
ERROR: test_pyc_compatibility (Lib.importlib.test.source.test_abc_loader.RegeneratedBytecodeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./Lib/importlib/test/source/util.py", line 23, in wrapper
    to_return = fxn(*args, **kwargs)
  File "./Lib/importlib/test/source/test_abc_loader.py", line 473, in test_pyc_compatibility
    mock.load_module(name)
  File "<frozen importlib._bootstrap>", line 760, in load_module
  File "<frozen importlib._bootstrap>", line 408, in module_for_loader_wrapper
  File "<frozen importlib._bootstrap>", line 636, in _load_module
  File "./Lib/importlib/test/source/test_abc_loader.py", line 201, in get_code
    code_object = super().get_code(name)
  File "/Users/marca/dev/hg-repos/cpython/Lib/importlib/abc.py", line 305, in get_code
    return marshal.loads(bytecode)
ValueError: bad marshal data (unknown type code)

----------------------------------------------------------------------
Ran 62 tests in 0.076s

FAILED (errors=1)
[114118 refs]
}}}

After my patch:

{{{
~/dev/hg-repos/cpython$ patch -p1 < ~/Desktop/cpython-issue-15030.patch 
patching file Lib/importlib/abc.py
~/dev/hg-repos/cpython$ ./python.exe -m unittest Lib/importlib/test/source/test_abc_loader.py 
..............................................................
----------------------------------------------------------------------
Ran 62 tests in 0.079s

OK
[114118 refs]
}}}
msg164122 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-06-27 06:16
Patch looks innocent enough; would be nice to fix this for 3.3.
msg164123 - (view) Author: Marc Abramowitz (Marc.Abramowitz) * Date: 2012-06-27 06:18
Similar issue in distribute: https://bitbucket.org/tarek/distribute/issue/283/bdist_egg-issues-with-python-330ax
msg164159 - (view) Author: Marc Abramowitz (Marc.Abramowitz) * Date: 2012-06-27 14:16
Updated patch based on feedback from Brett (thanks!)
msg164160 - (view) Author: Marc Abramowitz (Marc.Abramowitz) * Date: 2012-06-27 14:30
Third revision of my patch based on additional feedback from Brett (thanks!)...
msg164162 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-06-27 14:40
Patch looks good, Marc. Can you sign a PSF contributor agreement and send it in (http://www.python.org/psf/contrib/)?
msg164164 - (view) Author: Marc Abramowitz (Marc.Abramowitz) * Date: 2012-06-27 15:05
Brett, I just emailed the contributor agreement.
msg164169 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-06-27 16:50
I think the loader should just unconditionally assume 12 bytes header. This is meant to work only for the exact version it ships with, so it just has to match.
msg164171 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-27 16:57
> I think the loader should just unconditionally assume 12 bytes header. 
> This is meant to work only for the exact version it ships with, so it
> just has to match.

Agreed with Martin.
msg164189 - (view) Author: Marc Abramowitz (Marc.Abramowitz) * Date: 2012-06-27 19:23
Hmmm if I simply do:

diff -r b66e82c9f852 Lib/importlib/abc.py
--- a/Lib/importlib/abc.py	Tue Jun 26 23:05:27 2012 +0200
+++ b/Lib/importlib/abc.py	Wed Jun 27 12:15:55 2012 -0700
@@ -282,7 +282,7 @@
                 if len(raw_timestamp) < 4:
                     raise EOFError("bad timestamp in {}".format(fullname))
                 pyc_timestamp = _bootstrap._r_long(raw_timestamp)
-                bytecode = data[8:]
+                bytecode = data[12:]
                 # Verify that the magic number is valid.
                 if imp.get_magic() != magic:
                     raise ImportError(

then I get two "ValueError: bad marshal data (unknown type code)" test errors in test_abc_loader.

I am unsure as to whether this is a bug in the test or the implementation.

The following quells the errors, but I am not all confident that it's correct:

@@ -302,7 +302,10 @@
                     raise
             else:
                 # Bytecode seems fine, so try to use it.
-                return marshal.loads(bytecode)
+                try:
+                    return marshal.loads(bytecode)
+                except ValueError:
+                    return ''
         elif source_timestamp is None:
             raise ImportError("no source or bytecode available to create code "
                               "object for {0!r}".format(fullname),
msg164192 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-06-27 19:49
> I am unsure as to whether this is a bug in the test or the implementation.

You should probably investigate a bit more. We don't want to silence exceptions if there's not a good reason to.
msg164194 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-06-27 20:12
The patch could be updated to ignore the 12 bytes unconditionally when reading bytecode but still write out the accurate file size (when possible). That should be fully compatible within Python 3.3 no matter who generated the bytecode (PyPycLoader or import).
msg164199 - (view) Author: Marc Abramowitz (Marc.Abramowitz) * Date: 2012-06-27 21:04
Here's a patch that unconditionally switches over to the 12 byte format. I'm assuming the "size" in data[8:12] is the length of the bytecode?
msg164201 - (view) Author: Marc Abramowitz (Marc.Abramowitz) * Date: 2012-06-27 21:07
Oops, last attachment included the source timestamp twice instead of timestamp + bytecode size.
msg164202 - (view) Author: Marc Abramowitz (Marc.Abramowitz) * Date: 2012-06-27 21:08
Oops. Refactor. :-)
msg164265 - (view) Author: Marc Abramowitz (Marc.Abramowitz) * Date: 2012-06-28 15:40
I don't know if I'll have time soon to do the tweaks to Ronan's test (and maybe he wants to do them himself anyway?), but here's the correction of the size calculation in the header (from size of bytecode to size of source -- thanks, Brett!)
msg164525 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-07-02 18:39
Changeset b7463ec1980c has the fix and test update (went with a simple solution for the tests which uses the mock to verify). Thanks to Ronan and Marc for helping out!
msg164550 - (view) Author: Marc Abramowitz (Marc.Abramowitz) * Date: 2012-07-02 20:55
Hi Brett, can I get an ack in Misc/ACKS please (to make my Mom proud :-))? Attaching patch.
msg164556 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-07-02 21:40
Marc, can you please submit a contributor form?
msg164560 - (view) Author: Marc Abramowitz (Marc.Abramowitz) * Date: 2012-07-02 22:15
Hi Martin,

I already did. See http://bugs.python.org/msg164162 and http://bugs.python.org/msg164164. Maybe it's not "on file" yet?
msg164624 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-07-03 20:54
Maybe you can try emailing in the form again, Marc? Let me know when you have and if it is lost again I will bug the proper people.

Anyway, the original patch I committed added you to Misc/ACKS: http://hg.python.org/cpython/file/56260d30985d/Misc/ACKS#l16
History
Date User Action Args
2012-07-03 20:54:20brett.cannonsetmessages: + msg164624
2012-07-02 22:15:13Marc.Abramowitzsetmessages: + msg164560
2012-07-02 21:40:45loewissetmessages: + msg164556
2012-07-02 20:55:47Marc.Abramowitzsetfiles: + Misc_ACKS.patch

messages: + msg164550
2012-07-02 18:39:16brett.cannonsetstatus: open -> closed
resolution: fixed
dependencies: - Split .pyc parsing from module loading
messages: + msg164525
2012-06-28 15:40:46Marc.Abramowitzsetfiles: + cpython-issue-15030.patch

messages: + msg164265
2012-06-27 21:08:48Marc.Abramowitzsetfiles: + cpython-issue-15030.patch

messages: + msg164202
2012-06-27 21:07:12Marc.Abramowitzsetfiles: + cpython-issue-15030.patch

messages: + msg164201
2012-06-27 21:04:58Marc.Abramowitzsetfiles: + cpython-issue-15030.patch

messages: + msg164199
2012-06-27 20:12:43brett.cannonsetmessages: + msg164194
2012-06-27 19:49:58pitrousetmessages: + msg164192
2012-06-27 19:23:52Marc.Abramowitzsetmessages: + msg164189
2012-06-27 16:57:26pitrousetmessages: + msg164171
2012-06-27 16:50:56loewissetnosy: + loewis
messages: + msg164169
2012-06-27 15:59:19Arfreversetnosy: + Arfrever
2012-06-27 15:05:39Marc.Abramowitzsetmessages: + msg164164
2012-06-27 14:40:33brett.cannonsetassignee: brett.cannon
messages: + msg164162
stage: commit review
2012-06-27 14:30:31Marc.Abramowitzsetfiles: + cpython-issue-15030.patch

messages: + msg164160
2012-06-27 14:16:42Marc.Abramowitzsetfiles: + cpython-issue-15030.patch

messages: + msg164159
2012-06-27 06:18:30Marc.Abramowitzsetmessages: + msg164123
2012-06-27 06:16:37georg.brandlsetpriority: normal -> release blocker
nosy: + georg.brandl
messages: + msg164122

2012-06-27 05:36:57Marc.Abramowitzsetfiles: + cpython-issue-15030.patch
nosy: + Marc.Abramowitz
messages: + msg164121

2012-06-08 18:17:25brett.cannonsetmessages: + msg162544
2012-06-08 17:14:57Ronan.Lamysetmessages: + msg162537
2012-06-08 02:57:10brett.cannonsetnosy: + pitrou
messages: + msg162518
2012-06-08 01:31:32jceasetdependencies: + Split .pyc parsing from module loading
2012-06-08 01:03:56Ronan.Lamysetmessages: + msg162512
2012-06-07 21:28:55jceasetmessages: + msg162503
2012-06-07 19:39:25Ronan.Lamysetmessages: + msg162491
2012-06-07 19:09:36jceasetnosy: + jcea
messages: + msg162488
2012-06-07 18:29:04Ronan.Lamycreate