This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: types.CodeType() has no bytecode verifier
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, Dennis Sweeney, eric.smith, sbz, vstinner
Priority: normal Keywords: patch

Created on 2020-11-21 07:20 by sbz, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
crash.py sbz, 2020-11-21 07:20 crash poc
crash-report-txt.zip sbz, 2020-11-21 07:30
add-new-crasher-to-cover-multiple-versions.diff sbz, 2020-11-21 09:25
Pull Requests
URL Status Linked Edit
PR 23448 closed sbz, 2020-11-21 17:00
Messages (16)
msg381527 - (view) Author: Sofian Brabez (sbz) * Date: 2020-11-21 07:20
This PoC is causing a local crash of python interpreters version 2.7,3.6,3.7,3.8 and 3.9.

By creating a code object of size 0 with a POP_TOP opcode, in Python/ceval.c the call to Py_DECREF(value) on a NULL pointer lead to a segmentation fault of the python interpreter.

It was tested on all python3.x versions against a fresh compilation of a git clone github.com/python/cpython.git on branches and master. You need to adapt the code() constructor because the parameters are different across versions but crash remains.

I'm just covering the version 3.7 in following text

$ git clone --depth 1 https://github.com/python/cpython.git
$ git checkout -b 3.7 origin/3.7
$ export CFLAGS+="-g -O0"
$ ./configure
$ make
$ ./python -V
Python 3.7.9+
$ ./python -c 'import sys; print(sys.version)'
3.7.9+ (heads/3.7-dirty:08ba61dade, Nov 21 2020, 04:57:20) 
[Clang 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611a
$ ./python crash.py

Running the python3.7 execution into gdb, helped me to locate the crash for python3.7 https://github.com/python/cpython/blob/3.7/Python/ceval.c#L1104

$ gdb --batch --silent ./python -ex 'r crash.py'
Program received signal SIGSEGV, Segmentation fault.
0x000000000033873a in _PyEval_EvalFrameDefault (f=0x800bdda00, throwflag=0) at Python/ceval.c:1104
1104                Py_DECREF(value);

Also I have executed the PoC on different platforms Linux, FreeBSD and MacOSX. The behaviour is the same and SIGSEGV the interpreter.

I have located the issue in the source code but I'm wondering what will be the best solution to fix it? Python developers should know better, I am open to your advices and suggestions.

I have noticed that one assertion handle this case (in master) https://github.com/python/cpython/blob/master/Python/ceval.c#L1430 but most of the interpreters are built without --with-assertions enabled, so the crash will still persist.

More details on this gist https://gist.github.com/sbz/267d35de5766c53835c5c4ef45b18705

I think the python interpreter shouldn't crash and handle properly this edge case.
msg381528 - (view) Author: Sofian Brabez (sbz) * Date: 2020-11-21 07:30
Linux, FreeBSD and MacOSX crash reports and backtraces joined in the zip.

Contributor Agreement 2020-09-23 signed.
msg381529 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2020-11-21 08:32
From https://github.com/python/cpython/blob/master/Lib/test/crashers/bogus_code_obj.py :

"""
Broken bytecode objects can easily crash the interpreter.
This is not going to be fixed.  It is generally agreed that there is no
point in writing a bytecode verifier and putting it in CPython just for
this.  Moreover, a verifier is bound to accept only a subset of all safe
bytecodes, so it could lead to unnecessary breakage.
For security purposes, "restricted" interpreters are not going to let
the user build or load random bytecodes anyway.  Otherwise, this is a
"won't fix" case.
"""

import types

co = types.CodeType(0, 0, 0, 0, 0, b'\x04\x71\x00\x00',
                    (), (), (), '', '', 1, b'')
exec(co)
msg381536 - (view) Author: Sofian Brabez (sbz) * Date: 2020-11-21 09:25
Thanks Dennis for pointing me the crashers, I was not aware of them.

I have added a new crash test to cover multiple crashes in different interpreter versions. Current bogus does not crash where it could be possible to crash in older and newer interpreter versions.

Do you think it's worth to add it? I joined the patch against master and if yes I could submit a new PR.
msg381620 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-11-22 16:24
I'm sorry to interrupt but what is the exact reasoning behind adding a new, (I presume) redundant crasher? There are tons of different ways to crash the interpreter with malformed bytecode, how would adding only one of them bring any good?
msg381688 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-11-23 18:53
@BTaskaya: do you think this is too similar to bogus_code_obj.py? That's the only crasher I can see it being similar to.
msg381697 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-11-23 20:41
> do you think this is too similar to bogus_code_obj.py? That's the only crasher I can see it being similar to.

As far as I assume, yes, that is the generic VM crasher via custom code object execution. I feel its existence is good enough to answer to issue openings like this.
msg382695 - (view) Author: Sofian Brabez (sbz) * Date: 2020-12-07 23:11
Any ETA? I would appreciate to see my contribution being merged.

Batuhan, the current code is not doing what it's supposed to do in bogus_code_obj.py (i.e. no crash after 3.8 and above in master and branches). This still make the issue valid I guess.

```
$ ./python -V
Python 3.8.6+
$ ./python -V
Python 3.10.0a2+
$ ./python Lib/test/crashers/bogus_code_obj.py 
Traceback (most recent call last):
  File "/usr/home/sbz/github/cpython/Lib/test/crashers/bogus_code_obj.py", line 17, in <module>
    co = types.CodeType(0, 0, 0, 0, 0, b'\x04\x71\x00\x00',
TypeError: code expected at least 14 arguments, got 13
```

I would be happy to update and fix it instead if you think it's better that way. Please let me know, it will be great to see my first submission being treated.
msg382708 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2020-12-08 03:55
Why not just fix bogus_code_obj.py?

Something like this (using the replace method) would make it more future-proof to similar changes in the code object constructor signature (and be more readable!):

import dis
POP_TOP = dis.opmap['POP_TOP']
wordcode = bytes([POP_TOP, 0] * 10)
f = lambda: None
f.__code__ = f.__code__.replace(co_code=wordcode)
f()
msg382712 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-12-08 07:47
> I would be happy to update and fix it instead if you think it's better that way. Please let me know, it will be great to see my first submission being treated.

Yes, please. That seems more appropriate
msg382713 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-08 07:51
About PR 23448: I don't see the value of adding a script just prove that CPython has no bytecode verifier.

If you consider that it's not well documented, it should be documented in the types.CodeType documentation:
https://docs.python.org/dev/library/types.html#types.CodeType
msg382714 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-08 07:53
Compiled bytecode is not portable between Python versions. You may use https://bytecode.readthedocs.io/ which might help to write more portable code, but this project does not contain a bytecode verifier and I'm not sure that it's updated when the Python bytecode becomes backward incompatible. (I wrote the first version as a Proof-of-Concept of my PEP 511, but I no longer maintain it.)
msg382715 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-08 07:55
> You may use https://bytecode.readthedocs.io/ which might help to write more portable code, (...)

Ah, I didn't notice that at the beginning of the documentation, you can see that even a very simple loop has a different bytecode before and after Python 3.8:

"Python 3.8 removed SETUP_LOOP"
https://bytecode.readthedocs.io/en/latest/usage.html#simple-loop
msg391442 - (view) Author: Sofian Brabez (sbz) * Date: 2021-04-20 14:54
It's been a while and I still have no clear guidance from there of what developers want to do.

Follow-up on this again to see if requires updates or just close it.
msg391462 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-04-20 18:22
In terms of security model, usually, if an attacker can execute arbitrary Python code, the game is over. Executing bytecode is the same. Python doesn't provide any tooling to validate bytecode in its stdlib.

https://python-security.readthedocs.io/security.html#python-security-model

If you consider that it's an important use case, you can create a project on PyPI to validate bytecode. I don't think that it belongs to the stdlib.

Python/ceval.c doesn't validate bytecode at runtime for performance reasons.
msg391538 - (view) Author: Sofian Brabez (sbz) * Date: 2021-04-21 18:20
Thanks Victor, it's clear now.

About the updates of the crashers, I sent the updated patches to update them in PR [3] before the following issue [1] and PR [2] but the other seems to have been merged already. 

It seems my submission was totally forgot and now obsolete. So I'm gonna close this ticket.

[1] https://bugs.python.org/issue42734
[2] https://github.com/python/cpython/pull/23939
[3] https://github.com/python/cpython/pull/23448
History
Date User Action Args
2022-04-11 14:59:38adminsetgithub: 86588
2021-04-22 01:19:18benjamin.petersonsetstatus: open -> closed
resolution: not a bug
stage: patch review -> resolved
2021-04-21 18:20:43sbzsetmessages: + msg391538
2021-04-20 18:22:30vstinnersetmessages: + msg391462
2021-04-20 14:54:28sbzsetmessages: + msg391442
2020-12-08 07:55:05vstinnersetmessages: + msg382715
2020-12-08 07:53:15vstinnersetmessages: + msg382714
title: Py_Decref on value crash the interpreter in Python/ceval.c:1104 -> types.CodeType() has no bytecode verifier
2020-12-08 07:51:06vstinnersetnosy: + vstinner
messages: + msg382713
2020-12-08 07:47:29BTaskayasetmessages: + msg382712
2020-12-08 03:55:17Dennis Sweeneysetmessages: + msg382708
2020-12-07 23:11:11sbzsetmessages: + msg382695
2020-11-23 20:41:21BTaskayasetmessages: + msg381697
2020-11-23 18:53:55eric.smithsetmessages: + msg381688
versions: + Python 3.10, - Python 3.6, Python 3.7
2020-11-22 16:24:47BTaskayasetnosy: + BTaskaya
messages: + msg381620
2020-11-21 21:03:12eric.smithsetnosy: + eric.smith
2020-11-21 17:00:42sbzsetstage: patch review
pull_requests: + pull_request22340
2020-11-21 09:26:00sbzsetfiles: + add-new-crasher-to-cover-multiple-versions.diff
keywords: + patch
messages: + msg381536
2020-11-21 08:32:44Dennis Sweeneysetnosy: + Dennis Sweeney
messages: + msg381529
2020-11-21 07:30:06sbzsetfiles: + crash-report-txt.zip

messages: + msg381528
2020-11-21 07:20:07sbzcreate