classification
Title: py_compile.compile() replaces target files, breaking special files and symlinks
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: brett.cannon Nosy List: Arfrever, arigo, barry, brett.cannon, eric.snow, ncoghlan, pitrou, python-dev
Priority: normal Keywords: easy

Created on 2013-02-17 21:23 by Arfrever, last changed 2013-06-17 21:54 by Arfrever. This issue is now closed.

Messages (32)
msg182283 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * Date: 2013-02-17 21:23
Since d4eb02b6aac9 py_compile.compile() replaces target files, breaking special files and symlinks. Any custom permissions set on target files are also lost. This is a major regression.

# cd /tmp
# touch test.py
# ls -l /dev/null
crw-rw-rw- 1 root root 1, 3 Feb 17 21:16 /dev/null
# python3.3 -c 'import py_compile; py_compile.compile("test.py", cfile="/dev/null")'
# ls -l /dev/null
crw-rw-rw- 1 root root 1, 3 Feb 17 21:16 /dev/null
# python3.4 -c 'import py_compile; py_compile.compile("test.py", cfile="/dev/null")'
# ls -l /dev/null
-rw-r----- 1 root root 102 Feb 17 21:53 /dev/null
msg182334 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-02-18 19:02
So py_compile always replaced the file, even in Python 3.3: http://hg.python.org/cpython/file/79ea59b394bf/Lib/py_compile.py#l141 (this is why I changed the title of the bug). What did change, though, is that py_compile now writes bytecode files just as import does, which means copying the file permissions of the source file for the bytecode file. E.g. if you listed the permissions for test.py in your example output I bet it's -rw-r----- just like what /dev/null ended up with.

I'm not going to change this as this is very much on purpose for security reasons; it's more of a long-standing bug that py_compile never used the proper permissions than it is changing the semantics to what they are today. Since py_compile is for compiling and writing out source code it should match what import does (if you are doing this to verify the file will parse properly, which is what writing to /dev/null suggests, it would be more accurate to just pass the raw source code through the AST module and make sure no exceptions are raised, else just write into the /tmp directory and let the OS clean up).

What I will do, though, it document this fact in the docs with a "Changed in Python 3.4" note and in What's New since people should be aware of it.
msg182335 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * Date: 2013-02-18 19:24
In case of /dev/null, the main problem is that it became a regular file. It was previously a character device. Writing to character devices should not replace them.

What about problem with symlinks?:
$ cd /tmp
$ touch test.py
$ ln -s target.pyc test.pyc
$ python3.3 -c 'import py_compile; py_compile.compile("test.py", cfile="test.pyc")'
$ ls -l test.py test.pyc target.pyc
-rw-r--r-- 1 Arfrever Arfrever 102 02-18 20:20 target.pyc
-rw-r--r-- 1 Arfrever Arfrever   0 02-18 20:20 test.py
lrwxrwxrwx 1 Arfrever Arfrever  10 02-18 20:20 test.pyc -> target.pyc
$ python3.4 -c 'import py_compile; py_compile.compile("test.py", cfile="test.pyc")'
$ ls -l test.py test.pyc target.pyc
-rw-r--r-- 1 Arfrever Arfrever 102 02-18 20:20 target.pyc
-rw-r--r-- 1 Arfrever Arfrever   0 02-18 20:20 test.py
-rw-r--r-- 1 Arfrever Arfrever 102 02-18 20:21 test.pyc
msg182337 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * Date: 2013-02-18 19:34
py_compile.compile() was always replacing contents of target file, but was not causing that a given link name would refer to a different inode. builtins.open() has correct behavior:

# stat /dev/null
  File: ‘/dev/null’
  Size: 0               Blocks: 0          IO Block: 4096   character special file
Device: 5h/5d   Inode: 3242180     Links: 1     Device type: 1,3
Access: (0666/crw-rw-rw-)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-02-18 20:11:07.883986147 +0100
Modify: 2013-02-18 20:11:07.883986147 +0100
Change: 2013-02-18 20:11:07.883986147 +0100
 Birth: -
# python3.4 -c 'null = open("/dev/null", "wb"); null.write(b"abc"); null.close()'
# stat /dev/null
  File: ‘/dev/null’
  Size: 0               Blocks: 0          IO Block: 4096   character special file
Device: 5h/5d   Inode: 3242180     Links: 1     Device type: 1,3
Access: (0666/crw-rw-rw-)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-02-18 20:11:07.883986147 +0100
Modify: 2013-02-18 20:11:07.883986147 +0100
Change: 2013-02-18 20:11:07.883986147 +0100
 Birth: -
msg182383 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-02-19 14:36
So that all happens because importlib does an atomic write of the file which uses os.replace(): http://hg.python.org/cpython/file/83d70dd58fef/Lib/importlib/_bootstrap.py#l121 .

Unless there is some way that I can't think of to have the atomic write still exist but not change the type of file it replaces I am still not willing to revert my change just for this use case. There should be a single implementation of bytecode file generation and py_compile (along with compileall) should be nothing more than convenience modules for forcing the generation of those files under the same semantics as if they were done as a side-effect of importing some source code.
msg182397 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-19 17:05
> Unless there is some way that I can't think of to have the atomic write 
> still exist but not change the type of file it replaces

How about using the `mode` to write_atomic?
(which, incidentally, is already used to mirror the .py file's permissions in SourceFileLoader._cache_bytecode)
msg182400 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-02-19 17:31
Use the mode how exactly? I mean isn't the problem the os.replace() call and not os.open() on the source file?
msg182402 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-19 18:17
> Use the mode how exactly? I mean isn't the problem the os.replace()
> call and not os.open() on the source file?

If you want to reproduce the original file's access rights, you have to
pass the right mode flags to os.open().
Of course, this won't recreate symlinks and the like. But I don't think
we can do something for that anyway, since we want to replace to happen
automatically. People who like to have symlinks for their pyc files
probably have their own custom scripts to create pyc files, so they
should just re-use them.
msg182403 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-19 18:19
> Of course, this won't recreate symlinks and the like. But I don't think
> we can do something for that anyway, since we want to replace to happen
> automatically.

I meant: we want the replace to happen atomically :-)
msg182416 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-02-19 19:51
I think Arfever is more frustrated by the os.replace() call than the permissions.
msg182419 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-19 19:52
Ah, right. Well, there would be an argument not to use os.replace() in py_compile, since it's an offline processing step which generally shouldn't race with another (online) processing step.

Still, I wonder what the use case is (apart from the /dev/null case for which the answer is simply "don't do it" :-)).
msg182422 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-02-19 19:58
Exactly, and I don't want to have to slice up the internal API anymore in order to support this edge case which I don't think is important enough to support.
msg184090 - (view) Author: Roundup Robot (python-dev) Date: 2013-03-13 16:38
New changeset 55806d234653 by Brett Cannon in branch 'default':
Issue #17222: Document that py_compile now uses importlib for its file
http://hg.python.org/cpython/rev/55806d234653
msg189385 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2013-05-16 15:57
Antoine says:

> Ah, right. Well, there would be an argument not to use os.replace() in 
> py_compile, since it's an offline processing step which generally 
> shouldn't race with another (online) processing step.

But I think that's not necessarily true.

http://mail.python.org/pipermail/python-dev/2013-May/126241.html
msg189388 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-16 17:06
> > Ah, right. Well, there would be an argument not to use os.replace() in 
> > py_compile, since it's an offline processing step which generally 
> > shouldn't race with another (online) processing step.
> 
> But I think that's not necessarily true.
> 
> http://mail.python.org/pipermail/python-dev/2013-May/126241.html

Ha :) Then you know which kind of patch you have to try.
msg189422 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2013-05-17 07:12
Can someone confirm the answer to Arfrever's original question: a seemingly innocent use case of py_compile.compile(), which works fine until Python 3.3, when executed as root, can in Python 3.4 fundamentally break down a complete Posix system in a non-obvious way?
msg189426 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-17 07:22
Yes, the switch to renaming can change behaviour in some corner cases (I say "corner cases" because the expected situation is to have your pyc files be regular files, not symlinks or character devices). Python is certainly not the only application where you can bust /dev/null by specifying it as target location.

Mutating a file in place is a source of unpredictable issues with concurrent access of the file being written to, which is why it was changed to renaming. I think the class of issues which was solved (presumably) is much more important than the use case of compiling to /dev/null.

As for symlinks, I'd naively expect breaking symlinks to be a feature, but YMMV.
msg189428 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2013-05-17 07:48
My point is that in five years' time some existing well-tested sysadmin-like program will be run with Python 3.4 for the first time, and suddenly whole systems will break.  I don't want to sound negative but this is the worst behavior ever (with the possible exception of programs that do bad things beyond your own system, like viruses).  Please at least crash with an exception instead: add one easy check to py_compile.compile() and complain if cfile points to an existing non-regular file.
msg189435 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * Date: 2013-05-17 09:32
I fixed syntax sanity check in Portage (package manager in Gentoo) to avoid breaking system. This sanity check is always run during upgrade of Portage itself.
http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commitdiff;h=e8d1337b5936ee058872d25f21b914e96bcbf677
msg189441 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-17 09:50
> I fixed syntax sanity check in Portage (package manager in Gentoo) to
> avoid breaking system. This sanity check is always run during
> upgrade of Portage itself.

Well, we could add a dedicated function for sanity checking, then.
msg189442 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-17 09:51
> Please at least crash with an exception instead: add
> one easy check to py_compile.compile() and complain if cfile points
> to an existing non-regular file.

That's a good point (except that symlinks should probably be allowed too).
msg189870 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-05-23 17:56
If someone wants to submit a patch to make the change that's fine, but they are going to have the same issue when they simply execute an import that byte-compiles as a side-effect so you will have to decide if you are only worried about py_compile.
msg190600 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-06-04 14:44
I'm going to toss Armin's request over to core-mentorship since the fix is easy. Is all you are after, Armin, a ``not os.path.isfile(cfile) or os.path.islink(cfile)`` check which if true raises an exception?
msg190612 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2013-06-04 21:03
That's not really what I'm after, but that's a workaround I proposed to avoid the worst breakage.  I don't have an opinion about what to do with symlinks.
msg190616 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-06-04 21:52
Armin: I'm going off of http://bugs.python.org/msg189428 where you say you want an exception raised. If there is something else you want to suggest that's fine as long as it's not to revert the semantics since I'm not willing to do that.
msg190660 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2013-06-05 13:45
Brett: I don't really have a constructive opinion on this matter.  I could suggest putting a bit more logic in py_compile.compile(cfile=...) to revert to the old behavior if we specify an already-existing target, in order to fix the other reported issue that custom permissions set on the target file are lost.  But I would understand if you decide that this is not compatible with the gains of atomic file creations.  I only have a problem with replacing block devices with regular files.  I don't have a particular opinion if the target file is a symlink (as long, of course, as it doesn't lead to whatever it points to being replaced, but it shouldn't).
msg190663 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-06-05 14:35
Thanks for the clarification, Armin!
msg190664 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-06-05 14:46
OK, so here was what a patch will need:

* Change py_compile to raise an exception when ``not os.path.isfile(cfile) or os.path.islink(cfile)``; probably raise ValueError and mention that import itself would replace the file w/o raising an exception
* Add tests
* Update the docs for py_compile
* Update What's New for 3.4 since this is not backwards-compatible
msg190681 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-06-05 16:49
I'm going to resolve this issue myself, but "blog" about it on core-mentorship so others can follow along with how I resolve the bug.
msg191164 - (view) Author: Roundup Robot (python-dev) Date: 2013-06-14 22:33
New changeset 46ef1d2af352 by Brett Cannon in branch 'default':
Issue #17222: Raise FileExistsError when py_compile.compile would
http://hg.python.org/cpython/rev/46ef1d2af352
msg191313 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * Date: 2013-06-17 04:46
Exception messages are wrong. They contain name of source file instead of target file.
msg191378 - (view) Author: Roundup Robot (python-dev) Date: 2013-06-17 21:48
New changeset e2ccfb096e6a by Brett Cannon in branch 'default':
Issue #17222: fix a mix-up in some exception messages.
http://hg.python.org/cpython/rev/e2ccfb096e6a
History
Date User Action Args
2013-06-17 21:54:20Arfreversetstage: resolved
2013-06-17 21:48:53brett.cannonsetstatus: open -> closed
resolution: fixed
2013-06-17 21:48:35python-devsetmessages: + msg191378
2013-06-17 04:46:29Arfreversetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg191313

stage: resolved -> (no value)
2013-06-14 22:33:56brett.cannonsetstatus: open -> closed
resolution: fixed
stage: needs patch -> resolved
2013-06-14 22:33:29python-devsetmessages: + msg191164
2013-06-05 16:49:45brett.cannonsetassignee: brett.cannon
messages: + msg190681
2013-06-05 14:46:03brett.cannonsetresolution: fixed -> (no value)
messages: + msg190664
components: + Library (Lib), - Documentation
2013-06-05 14:35:15brett.cannonsetmessages: + msg190663
2013-06-05 13:45:34arigosetmessages: + msg190660
2013-06-04 21:52:30brett.cannonsetmessages: + msg190616
2013-06-04 21:03:04arigosetmessages: + msg190612
2013-06-04 14:44:23brett.cannonsetstatus: closed -> open
assignee: brett.cannon -> (no value)
messages: + msg190600
2013-05-23 17:56:27brett.cannonsetmessages: + msg189870
2013-05-17 09:58:19Arfreversettitle: py_compile.compile() explicitly sets st_mode for written files -> py_compile.compile() replaces target files, breaking special files and symlinks
2013-05-17 09:51:50pitrousetmessages: + msg189442
2013-05-17 09:50:45pitrousetmessages: + msg189441
title: py_compile.compile() explicitly sets st_mode for written files -> py_compile.compile() explicitly sets st_mode for written files
2013-05-17 09:32:55Arfreversetmessages: + msg189435
2013-05-17 07:48:29arigosetmessages: + msg189428
2013-05-17 07:22:41pitrousetmessages: + msg189426
2013-05-17 07:12:36arigosetnosy: + arigo
messages: + msg189422
2013-05-16 17:06:25pitrousetmessages: + msg189388
2013-05-16 15:57:16barrysetmessages: + msg189385
2013-05-16 15:42:51barrysetnosy: + barry
2013-03-13 16:38:39brett.cannonsetstatus: open -> closed
resolution: fixed
2013-03-13 16:38:25python-devsetnosy: + python-dev
messages: + msg184090
2013-02-19 19:58:40brett.cannonsetmessages: + msg182422
2013-02-19 19:52:58pitrousetmessages: + msg182419
2013-02-19 19:51:33brett.cannonsetmessages: + msg182416
2013-02-19 18:19:12pitrousetmessages: + msg182403
2013-02-19 18:17:39pitrousetmessages: + msg182402
2013-02-19 17:31:36brett.cannonsetmessages: + msg182400
2013-02-19 17:05:26pitrousetmessages: + msg182397
2013-02-19 14:36:25brett.cannonsetnosy: + pitrou
messages: + msg182383
2013-02-18 19:34:32Arfreversetmessages: + msg182337
2013-02-18 19:24:10Arfreversetmessages: + msg182335
2013-02-18 19:02:17brett.cannonsetpriority: high -> normal

assignee: brett.cannon
components: + Documentation
title: py_compile.compile() replaces target files, breaking special files and symlinks -> py_compile.compile() explicitly sets st_mode for written files
keywords: + easy
messages: + msg182334
stage: needs patch
2013-02-17 21:23:44Arfrevercreate