classification
Title: Python can sometimes create incorrect .pyc files
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, cstratak, eric.snow, meador.inge, ncoghlan, petr.viktorin, serhiy.storchaka, tzickel, vstinner
Priority: normal Keywords: patch

Created on 2015-09-13 11:26 by tzickel, last changed 2018-11-24 21:47 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
ferror_check_in_tokenizer.patch tzickel, 2015-09-18 16:41 review
import_io_check.sh tzickel, 2017-10-21 16:04
Pull Requests
URL Status Linked Edit
PR 8449 merged tzickel, 2018-07-24 19:08
Messages (24)
msg250556 - (view) Author: (tzickel) * Date: 2015-09-13 11:26
I had a non-reproducible issue occur a few times in which python 2.7.9 would produce .pyc files with empty code objects on a network drive under windows. The .pyc might have been created due to intermittent network errors that are hard to reproduce reliably. The .pyc files would override the previous correct .pyc files that existed in the same place.

The incorrect .pyc is a valid file, but instead of having the code object of the original .py file compiled, it would have the code object of an empty .py file. Python would then go on to use the incorrect .pyc file until it is manually deleted.

This peculiar .pyc files got me thinking about how cpython can produce such an incorrect .pyc file instead of failing.

The main issue here is that getc function, returns EOF both on EOF and on file error. It seems as if the tokenizer starts reading the file stream, and gets an EOF directly, it would not check if it resulted from actually reading an empty file or because of an file error, and happily return an empty AST which would be then compiled to a bad empty code .pyc instead of aborting the process because of an file error.
msg250617 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-09-14 04:42
Inspecting the 2.7 code:

https://hg.python.org/cpython/file/2.7/Python/import.c#l761
following that down to 
https://hg.python.org/cpython/file/2.7/Python/marshal.c#l1126

it looks like it does the right thing on EOF error (from either getc or from fread) and raises an exception which aborts load_compiled_module.
msg250619 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-09-14 04:42
we'll need a test case .pyc where this happens.
msg250653 - (view) Author: (tzickel) * Date: 2015-09-14 10:49
You are not looking at the correct code, the function you are pointing to, check_compiled_module is run to check the existing .pyc (it is a good question, why the .pyc is overriden, but that is a secondary issue, which I cannot reproduce as I've said by demand).

I am talking about the code which creates a new (and incorrect) .pyc in parse_source_module:
https://hg.python.org/cpython/file/2.7/Python/import.c#l861
calls in the end to Py_UniversalNewlineFgets
https://hg.python.org/cpython/file/2.7/Objects/fileobject.c#l2749
you can see that function will return NULL if it gets an EOF because of a file error, and then the tokenises which calls it will not know if it got NULL because of EOF or file error, and compile the AST and generate an incorrect .pyc file.
msg250775 - (view) Author: (tzickel) * Date: 2015-09-15 14:55
As for the "example" .pyc just create an empty 0 byte .py file and compile it, that is the same .pyc that is created in my system (instead in my case the .py is not empty).

Just so people don't have to trace the code like I did, here is the traceback of the primary issue. Remember that my hypothesis is that fopen returns an FILE stream, that returns EOF on the first get because of an I/O error, not because the file is empty:

--> GETC is called, and gets EOF on the first try, and thus Py_UniversalNewlineFgets returns NULL
  * frame #0: 0x0000000109fe4c44 Python`Py_UniversalNewlineFgets
    frame #1: 0x0000000109fc972c Python`decoding_fgets + 321
    frame #2: 0x0000000109fc9262 Python`tok_nextc + 918
    frame #3: 0x0000000109fc830e Python`PyTokenizer_Get + 171
    frame #4: 0x0000000109fc5853 Python`parsetok + 128
    frame #5: 0x000000010a066748 Python`PyParser_ASTFromFile + 109
-> Now load_source_module has an empty AST which will get compiled to an empty code module
-> this code is optmized so we don't see the parse_source_module call
here, but write_compiled_module will be called afterwards with the empty
AST tree instead of aborting...
    frame #6: 0x000000010a05e3b5 Python`load_source_module + 671
    frame #7: 0x000000010a05f003 Python`import_submodule + 270
    frame #8: 0x000000010a05ebc6 Python`load_next + 284
    frame #9: 0x000000010a05cb5d Python`PyImport_ImportModuleLevel + 453
    frame #10: 0x000000010a042641 Python`builtin___import__ + 135
....
msg251011 - (view) Author: (tzickel) * Date: 2015-09-18 16:41
Not sure why nobody has responded yet, but I have gone up and made a patch for the problem for 2.7 HEAD. Would be great if someone with more understanding of python's source could say if this is the optimal place to do the ferror test.

I am able to see that this patch fixes the issue (now the import fails on EOF with an error instead of producing an empty valid AST and an empty code object .pyc file)

Unfortunately testing that the patch fixes the issue, currently involves LD_PRELOADing an dynamic library which hooks __srget (because thats what the getc macro in Py_UniversalNewlineFgets uses in posix systems) to return EOF, and ferror to return a non zero result:

If the code for the hooking dynamic library is needed, please tell me (I can't figure out how to make an automated test for it).

-----

shell$ cat a.py
print 'hi'

Before fixing python:

shell$ ls a.py*
a.py
shell$ DYLD_FORCE_FLAT_NAMESPACE=1 DYLD_INSERT_LIBRARIES=libblah.dylib python
Python 2.7.10 (default, Jul 13 2015, 12:05:58)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['BORK_IO_TEST'] = 'a' # this activates the EOF / ferror
>>> import a # this should print 'hi' or fail but does not...
>>> a
<module 'a' from 'a.py'>
>>> exit()
shell$ ls a.py*
a.py   a.pyc

You can see that it accepted a.py as an empty file and a bad a.pyc was created.

After the patch:
shell$ ls a.py*
a.py

shell$ DYLD_FORCE_FLAT_NAMESPACE=1 DYLD_INSERT_LIBRARIES=libblah.dylib ./python.exe
Python 2.7.10+ (2.7:f6125114b55f+, Sep 18 2015, 19:18:34)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.72)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['BORK_IO_TEST'] = 'a'
>>> import a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "a.py", line 1

    ^
SyntaxError: unexpected EOF while parsing
>>> a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> exit()
shell$ ls a.py*
a.py

Now the import failed, and of course no empty code .pyc file was created.
msg251020 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-09-18 18:52
I have not looked at your patch, tzickel, but I just wanted to let you know a lack of reply just means lack of time on the part of the core developers. We are all volunteers and essentially don't get paid to spend our time to work on Python. Add in the fact that this is an odd edge case issue on Python 2.7 and that lowers the priority for someone to get to it when there are more pressing issues related to the new 3.5.0 release.

Hopefully someone who has a good understanding of the low-level C details you are talking about in your patch will be able to make a good judgment as to whether the test case and patch make sense.
msg251435 - (view) Author: (tzickel) * Date: 2015-09-23 17:16
Although I haven't reviewed python 3.5 code, I've put an breakpoint on calling "ferror" in the debugger, and it seems that python 3 does not check the file status on import as well...
msg253863 - (view) Author: (tzickel) * Date: 2015-11-01 18:31
TL:DR
Python 2 forgot to do I/O error checking when reading .py files from disk. On some rare situations this can bite Python in the ass and cause it to bork .pyc files.

Checked python 3, it checks the I/O in a different / better way.

Next python 2.7 is out in 1.5 month, I really want this fix to get in, did I forgot to nose some developer who can help out ?
msg253866 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-11-01 18:48
Nope, I think you probably have everyone who might be interested in looking at fixing this on the issue, tzickel. Just a matter of someone taking the interest and initiative to fix this in Python 2.7.
msg253878 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2015-11-01 21:44
I started poking at the patch a little and have a few comments.

My understanding of the issue comments is that the read error actually happens when reading in the *source* file and *not* the bytecode file.  This happens because 'ferror' is not checked after receiving an EOF and thus we think we just have an empty source file.  I can understand how creating a reproducible test case for this error path would be very difficult.

So, checking for errors with 'ferror' definitely seems reasonable, but why do it in the tokenizer code?  I already see several places in 'fileobject.c' that do similar checks.  For example, in 'get_line' I see:

   while ( buf != end && (c = GETC(fp)) != EOF ) {
      ...
   }
   if (c == EOF) {
      if (ferror(fp) && errno == EINTR) {
         ...
      }
   }

As such, wouldn't handling this error case directly in 'Py_UniversalNewlineFgets' similar to the above code be more appropriate?
msg253943 - (view) Author: (tzickel) * Date: 2015-11-02 19:10
1. You are correct the issue I am talking about is in parsing source files (Altough because python caches them as .pyc it's a worse situation).
2. The example you give is EINTR handling (which is mostly handling interrupted I/O operations by signals and retrying them) the real I/O error checking in that get_line is I belive in the next ferror check there. It might be nice to have EINTR checking (and retry) when reading the source file as well, but that is a secondary issue.
3. As for your recommendation for changing Py_UniversalNewlineFgets, you can see that both it's documentation says "Note that we need no error handling: fgets() treats error and eof identically." and since it seems like a low-level function that does not have any python stuff like exception handling, and in it's current signature it can't return an error (it simply returns the buffer, or NULL if nothing was read).
4. As for why putting it in that position, basically there could be a few I/O paths, besides Py_UniversalNewlineFget, such as codec decoding in fp_readl (via decoding_fgets) that can fail in I/O as well. Looking at the code again (while not tested), maybe the check can be actually moved to the end of decoding_fgets in tokenizer.c i.e. if there is an ferror in tok->fp in the end of decoding_fgets then to return error_ret(tok); there, but then double eyes need to be sure that no other code path can have an I/O error. I am not an expert on the layout of tokenizer (read it mostly to figure out this bug) so if that's better it can be moved I guess.
msg254943 - (view) Author: (tzickel) * Date: 2015-11-19 23:12
Meador Inge any other questions regarding the issue ? I can't believe 2.7.11 is coming out soon, and nobody is taking this issue seriously enough...
msg254944 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2015-11-19 23:30
As I said previously, tzickel, this is an edge case in Python 2.7 which means it has a low priority for most core developers. Realize that we are also working towards a Python 3.5.1 release and a 3.4.4 release as well (which will be the first and last bugfix release for those feature releases, respectively). Compare that to Python 2.7 which probably has 10 more bugfix releases and where this issue has now existed for five years and this is the first report for it, meaning we are not rushing to get it done with other pressing issues to work on.

I realize it's serious to *you*, but that unfortunately doesn't translate it to it being as important to everyone else so please be patient. If this is still an issue for you then consider deleting all of your .pyc files and switching off bytecode file generation: https://docs.python.org/2/library/sys.html#sys.dont_write_bytecode .
msg268920 - (view) Author: (tzickel) * Date: 2016-06-20 19:42
any chance for 2.6.12 ? 4 line patch.
msg268921 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-06-20 19:46
Python 2.6 is no longer supported.
msg268944 - (view) Author: (tzickel) * Date: 2016-06-20 22:16
Sorry Brett of course I meant the upcoming 2.7.12 ....
msg304699 - (view) Author: (tzickel) * Date: 2017-10-21 09:52
OK, This issue has been biting me a few more times in production, so for now I've added the environment variable PYTHONDONTWRITEBYTECODE which resolves it (but it's a hack). I'm sure I am not the only one with it (recall that this is happening in a complex setup where I run python in windows via a network drive hosted in netapp, and runs thousands of times from many machines).

So, I've been looking for a simple way to show you it's a major fault at the python 2 import I/O error checking, and found that new strace has fault injection capability :)

In this demo I'll be running under debian sid (has strace version high enough for fault injection and latest python 2), you can use docker if you don't have it.

On my mac, I'm running (this example is on one of python's init modules, but of course can happen on any .py file):
----
user$ docker run -it --cap-add SYS_PTRACE debian:sid
----
The cap-add is needed for strace to run.
----
2dcc36934ea6">root@2dcc36934ea6:/# apt-get update && apt-get install -y strace python

2dcc36934ea6">root@2dcc36934ea6:/# python
Python 2.7.14 (default, Sep 17 2017, 18:50:44)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
----
Python works just fine.
----
2dcc36934ea6">root@2dcc36934ea6:/# strace -P /usr/lib/python2.7/sre_parse.pyc -P /usr/lib/python2.7/sre_parse.py -e trace=read -e fault=read python
read(7, 0x55c3ac2ad900, 4096)           = -1 EPERM (Operation not permitted) (INJECTED)
...
read(6, 0x55c3ac2cb680, 4096)           = -1 EPERM (Operation not permitted) (INJECTED)
...
Traceback (most recent call last):
  File "/usr/lib/python2.7/site.py", line 554, in <module>
    main()
...
  File "/usr/lib/python2.7/sre_compile.py", line 572, in compile
    p = sre_parse.parse(p, flags)
AttributeError: 'module' object has no attribute 'parse'
+++ exited with 1 +++
----
This command simply causes the python process to fail the read command on the files sre_parse.py and sre_parse.pyc (the .pyc btw already existed from previous run).
This should be OK, since it can't read a required module form disk.
----
2dcc36934ea6">root@2dcc36934ea6:/# python
Traceback (most recent call last):
  File "/usr/lib/python2.7/site.py", line 554, in <module>
    main()
...
  File "/usr/lib/python2.7/sre_compile.py", line 572, in compile
    p = sre_parse.parse(p, flags)
AttributeError: 'module' object has no attribute 'parse'
----
This is already bad, python does not work anymore, now even without an I/O error :(
----
2dcc36934ea6">root@2dcc36934ea6:/# ls -l /usr/lib/python2.7/sre_parse.pyc
-rw-r--r-- 1 root root 118 Oct 21 09:20 /usr/lib/python2.7/sre_parse.pyc
----
If we check, we see that the previous python instance with I/O error created an empty byte code valid sre_parse.pyc (you can check it by dis.dis it, and see it's empty code object), this is the crux of the bug.
----
2dcc36934ea6">root@2dcc36934ea6:/# rm /usr/lib/python2.7/sre_parse.pyc
----
let's delete the bad .pyc file
----
2dcc36934ea6">root@2dcc36934ea6:/# python
Python 2.7.14 (default, Sep 17 2017, 18:50:44)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
----
yey, python works now again
----
2dcc36934ea6">root@2dcc36934ea6:/# ls -l /usr/lib/python2.7/sre_parse.pyc
-rw-r--r-- 1 root root 21084 Oct 21 09:20 /usr/lib/python2.7/sre_parse.pyc
----
We can see now that the .pyc file has a much bigger size (21084 bytes compared to 118 from before)
----
2dcc36934ea6">root@2dcc36934ea6:/# strace -P /usr/lib/python2.7/sre_parse.pyc -P /usr/lib/python2.7/sre_parse.py -e trace=read -e fault=read python -B
read(7, 0x55ceb72a7900, 4096)           = -1 EPERM (Operation not permitted) (INJECTED)
...
read(6, 0x55ceb72c5680, 4096)           = -1 EPERM (Operation not permitted) (INJECTED)
...
Traceback (most recent call last):
  File "/usr/lib/python2.7/site.py", line 554, in <module>
    main()
...
AttributeError: 'module' object has no attribute 'parse'
+++ exited with 1 +++
----
We can now try this issue with python -B which should not try to create .pyc files
----
2dcc36934ea6">root@2dcc36934ea6:/# python
Python 2.7.14 (default, Sep 17 2017, 18:50:44)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
----
yey, python still works (hopefully if this is an network I/O error, it will stop occurring on a future run with a watchdog for an server app)
----

A less likely variant (but possible) is that if the .pyc does not exist, and you have an I/O error on importing the .py, it will produce an bad .pyc file, you can try it:
2dcc36934ea6">root@2dcc36934ea6:/# rm /usr/lib/python2.7/sre_parse.pyc
2dcc36934ea6">root@2dcc36934ea6:/# strace -P /usr/lib/python2.7/sre_parse.py -e trace=read -e fault=read python
2dcc36934ea6">root@2dcc36934ea6:/# python
Python will not work until you delete the .pyc file.

----

(*) The I/O error needs to occur when the AST parser will still produce a valid code object, this easily happens when the error is just when starting to read the file (will produce an empty code AST which will create an empty code object)

----

You can install python3 and see that it doesn't effect it the same way (i.e. it will abort on the I/O error, but it won't produce a bad .pyc file, since it has much better I/O error handling)

----
5ef2a0658fa8">root@5ef2a0658fa8:/# strace -P /usr/lib/python3.5/encodings/aliases.py -P /usr/lib/python3.5/encodings/__pycache__/aliases.cpython-35.pyc -e trace=read -e fault=read python3
....
Aborted

5ef2a0658fa8">root@5ef2a0658fa8:/# python3
Python 3.5.2 (default, Sep 14 2017, 22:51:06)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()

----
The patch previously included should fix this issue.
msg304701 - (view) Author: (tzickel) * Date: 2017-10-21 09:58
Ignore the hash append / link at the start of each shell command (it's the output from docker, and not related to python commits).

BTW, forgot to mention, of course when doing the fault injection on the .py files, the error is bad as well, it should be I/O error, and instead it shows that it's an empty module:
AttributeError: 'module' object has no attribute 'parse'

(The patch actually fixes that).
msg304710 - (view) Author: (tzickel) * Date: 2017-10-21 16:04
Added a script to check if the bug exists (provided you have an updated strace 4.15 or above).

Without patch:
# ./import_io_check.sh
strace: Requested path 'tmp.py' resolved into '/root/tmp.py'
read(3, 0x55fc3a71cc50, 4096)           = -1 ENOSYS (Function not implemented) (INJECTED)
read(3, 0x55fc3a71cc50, 4096)           = -1 ENOSYS (Function not implemented) (INJECTED)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named py
+++ exited with 1 +++
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named py
Bug exists an incorrect .pyc has been produced

With patch:
# PYTHON=Python-2.7.14-with-patch/python ./import_io_check.sh
strace: Requested path 'tmp.py' resolved into '/root/tmp.py'
read(3, 0x55a8ff7d3020, 4096)           = -1 ENOSYS (Function not implemented) (INJECTED)
read(3, 0x55a8ff7d3020, 4096)           = -1 ENOSYS (Function not implemented) (INJECTED)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "tmp.py", line 1

    ^
SyntaxError: unexpected EOF while parsing
+++ exited with 1 +++
Script finished successfully
msg304724 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2017-10-21 20:33
I'm un-cc'ing myself as I don't use Python 2 in an environment where we allow it to write .pyc files at application runtime.
msg304727 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-10-22 02:55
Adding a couple of Red Hat folks to the nosy list, as even though this was originally reported for Windows, the reproducers show that it's a cross-platform issue.

tzickel would you mind signing the Python CLA and turning your patch into a PR on Github?
msg304994 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2017-10-25 15:36
tzickel, could you sign the CLA? It will formally allow PSF to distribute your patch with Python. (You retain the copyright.) See the dev guide for instructions: https://docs.python.org/devguide/pullrequest.html#cla

I can't do that for you, but if you want, I can do the rest (open a pull request and convince some core developer to merge it) after you sign the CLA.
msg324950 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2018-09-10 18:46
New changeset f64c813de84011a84ca21d75a294861a9cc2dfdc by Petr Viktorin (tzickel) in branch '2.7':
bpo-25083: Python can sometimes create incorrect .pyc files (GH-8449)
https://github.com/python/cpython/commit/f64c813de84011a84ca21d75a294861a9cc2dfdc
History
Date User Action Args
2018-11-24 21:47:13vstinnersetcomponents: + Interpreter Core
2018-11-19 10:12:36vstinnersetresolution: fixed
2018-09-10 18:49:57petr.viktorinsetstatus: open -> closed
stage: patch review -> resolved
2018-09-10 18:46:18petr.viktorinsetmessages: + msg324950
2018-07-24 19:24:54brett.cannonsetnosy: - brett.cannon
2018-07-24 19:08:32tzickelsetstage: patch review
pull_requests: + pull_request7971
2017-10-25 15:49:20vstinnersetnosy: + vstinner
2017-10-25 15:36:07petr.viktorinsetmessages: + msg304994
2017-10-22 02:55:52ncoghlansetnosy: + petr.viktorin, cstratak
messages: + msg304727
2017-10-21 20:33:23gregory.p.smithsetnosy: - gregory.p.smith
2017-10-21 20:33:10gregory.p.smithsetnosy: brett.cannon, gregory.p.smith, ncoghlan, benjamin.peterson, meador.inge, eric.snow, serhiy.storchaka, tzickel
messages: + msg304724
2017-10-21 16:04:25tzickelsetfiles: + import_io_check.sh
nosy: + brett.cannon
messages: + msg304710

2017-10-21 09:58:03tzickelsetmessages: + msg304701
2017-10-21 09:53:03tzickelsetmessages: + msg304699
2016-06-20 22:16:19tzickelsetmessages: + msg268944
2016-06-20 19:46:31brett.cannonsetnosy: - brett.cannon
2016-06-20 19:46:24brett.cannonsetmessages: + msg268921
2016-06-20 19:42:38tzickelsetmessages: + msg268920
2016-01-03 19:20:22tzickelsetnosy: + serhiy.storchaka
2015-11-19 23:30:01brett.cannonsetmessages: + msg254944
2015-11-19 23:12:43tzickelsetmessages: + msg254943
2015-11-02 19:10:55tzickelsetmessages: + msg253943
2015-11-01 21:44:28meador.ingesetmessages: + msg253878
2015-11-01 18:48:32brett.cannonsetmessages: + msg253866
2015-11-01 18:31:13tzickelsetmessages: + msg253863
2015-09-23 17:16:45tzickelsetnosy: + ncoghlan, eric.snow
messages: + msg251435
2015-09-18 18:52:49brett.cannonsetmessages: + msg251020
2015-09-18 16:41:16tzickelsetfiles: + ferror_check_in_tokenizer.patch

nosy: + benjamin.peterson
messages: + msg251011

keywords: + patch
2015-09-15 14:56:00tzickelsetmessages: + msg250775
2015-09-14 11:27:09tzickelsetnosy: + brett.cannon, meador.inge
2015-09-14 10:49:35tzickelsetmessages: + msg250653
2015-09-14 04:42:27gregory.p.smithsetmessages: + msg250619
2015-09-14 04:42:06gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg250617
2015-09-13 11:26:12tzickelcreate