classification
Title: About vulnerabilities in Cpython native code
Type: security Stage: patch review
Components: Build Versions: Python 3.11, Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, miss-islington, nanjekyejoannah, pablogsal, rhettinger, serhiy.storchaka, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2022-01-06 13:07 by urnotmax, last changed 2022-01-14 04:35 by miss-islington.

Files
File name Uploaded Description Edit
CPython3.10.0_vulnerability_report.txt urnotmax, 2022-01-06 13:07
Pull Requests
URL Status Linked Edit
PR 30445 closed rhettinger, 2022-01-06 21:29
PR 30591 merged vstinner, 2022-01-14 01:53
PR 30592 merged miss-islington, 2022-01-14 04:11
PR 30593 merged miss-islington, 2022-01-14 04:11
Messages (26)
msg409841 - (view) Author: Lin (urnotmax) * Date: 2022-01-06 13:07
I am currently doing some research on the security of CPython. I used the open source vulnerability analysis engine, Infer(https://fbinfer.com/), to scan the native code of CPython 3.10.0. 

The scan results show that there are still a number of vulnerabilities in the CPython native code, such as Null dereference, Uninitialized variable, Resource/Memory leak, etc. Moreover, I found that some of the vulnerabilities are related to Python/C API. I enclose the vulnerability report for your reference.

Based on the research of the result, I tried to design a tool to automatically detect and repair vulnerabilities in CPython and make this tool available. See:

https://github.com/PVMPATCH/PVMPatch

Python is my favourite programming language. I sincerely hope that I can help Python become stronger and safer. I hope this discovery can be useful for you to develop Python in the future.
msg409854 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2022-01-06 14:52
Thank you for posting this.

Some of these look like false positives.

For example:
#263
Parser/string_parser.c:670: error: Uninitialized Value
  The value read from parenstack[_] was never initialized.
  668.             }
  669.             nested_depth--;
  670.             int opening = (unsigned char)parenstack[nested_depth];
                   ^
  671.             if (!((opening == '(' && ch == ')') ||
  672.                   (opening == '[' && ch == ']') ||

I don't see how this could be an uninitialized read, although I'm willing to be wrong.

If your tool can produce patches to fix reported problems, I suggest that you create PRs for specific issues, so they can be reviewed individually. There's no way we'd review a single patch for all 673 issues that were identified.

Also, looking at the first one:
#0
Objects/clinic/bytearrayobject.c.h:50: error: Dead Store
  The value written to &noptargs (type long) is never used.
  48.             goto exit;
  49.         }
  50.         if (!--noptargs) {
                   ^
  51.             goto skip_optional_pos;
  52.         }

We've discussed this before. The consensus last time was to leave code like this in place, in case other code was added after this that refers to the same pointer. Our assumption is that compilers will remove the unneeded store. Is it possible to remove Dead Stores from the output, and/or produce a separate output with just Dead Stores? I don't see how a Dead Store can be a vulnerability.
msg409859 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2022-01-06 15:17
> Based on the research of the result, I tried to design a tool to automatically detect and repair vulnerabilities in CPython and make this tool available. See:

You mention here that your tool automatically "repairs" the code. Could you submit a sample PR with the repairs that your tool does so we can evaluate it?
msg409868 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2022-01-06 15:57
As an aside, there's an issue with Roundup where a username composed of all digits causes it to think that name is a user ID in the nosy list.  I recommend changing your username to include a non-digit character so that others can interact with you on your issues without constantly removing you from the nosy list.
msg409884 - (view) Author: Joannah Nanjekye (nanjekyejoannah) * (Python committer) Date: 2022-01-06 17:17
> You mention here that your tool automatically "repairs" the code. Could you >submit a sample PR with the repairs that your tool does so we can evaluate it?

I second this.

I obviously, am assuming good intentions from the author, not experimenting on us.
msg409895 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2022-01-06 18:41
> I don't see how this could be an uninitialized read, although I'm willing to be wrong.

It can be uninitialized if the parenstack[nested_depth] value is itself initialized, which can happen if the memory block pointed by parenstack has not been initialized to some value after malloc'ed and parenstack[nested_depth] never got a value.

But yeah, a lot of hypotheticals here that I am sure are not possible in the actual code.
msg409902 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-01-06 19:23
#244 is a false positive.  The value of new_state[i] on line 454 was initialized on line 442 with: new_state[i] = (uint32_t)element.

#387 is also a false positive.  There is an assertion on the previous line that the item != NULL.  That assertion passes because item is the result of a call to deque_popleft() which only returns NULL when the size is zero; however, deque_del_item() is not ever called with size 0 and it has an assertion to that effect.  The two callers are deque_remove() and deque_ass_item().  Both calls are guarded by regular tests (not assertions) so that they are only called when the index >= 0 and index < len(deque).
msg409904 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2022-01-06 19:36
I don't want to belabor this, but hey, it's in f-strings! And if it's an actual problem I'd like to fix it.

> It can be uninitialized if the parenstack[nested_depth] value is itself initialized, which can happen if the memory block pointed by parenstack has not been initialized to some value after malloc'ed and parenstack[nested_depth] never got a value.

parenstack is allocated on the stack, not that that changes the discussion much.

It looks like everywhere that parenstack[nested_depth] is read, it's already been written to.

Anyway, (if I'm right,) this makes my points that a) there are false positives, and b) we should have separate issues for each actual problem.
msg409907 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-01-06 19:50
#324 and #325 are false positives.  The result variable is initialized in the preceding lines:

    if (len_a == length) {
        left = *((volatile const unsigned char**)&a);
        result = 0;
    }
    if (len_a != length) {
        left = b;
        result = 1;
    }

While the code is correct, the second test should be changed to "else".
msg409909 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-01-06 20:08
#382 is false positive.  The "iterable" variable is only accessed when known to not be NULL.

    # Line 970
    if (iterable != NULL) {
        if (set_update_internal(so, iterable)) {
            Py_DECREF(so);
            return NULL;
        }
    }
msg409911 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-01-06 20:13
#420 and #421 are false positives.  The value of "c" is initialized a few lines before use.

        for (;;) {
            c = tok_nextc(tok);
            ...
        }
        ...
        tok_backup(tok, c);
        if (c == '#' || c == '\n' || c == '\\') {
           ...
        }
msg409912 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2022-01-06 20:14
Last I knew, CPython C code is a) regularly scanned by Valgrind and b) Valgrind is somehow informed as to false positives to not report.  But I know none of the details.  So I suggest you look into this and how to not report the same false positives.

I suggest working with the 'main' branch (future 3.11) as nearly all patches are applied there first and backported as appropriate.

Infer is a facebook github project, used by by other big corps and projects, so it may be worthwhile working with if false positives can be suppressed.
msg409918 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2022-01-06 20:24
> Anyway, (if I'm right,) this makes my points that a) there are false positives, and b) we should have separate issues for each actual problem.

Sorry Eric, I failed to clarify my comment: you are absolutely right in your analysis. I was trying to backtrack what the tool is thinking and how that code could result in an initialized read based only on static analysis.

Your analysis is right and these are indeed false positives. Apologies for the confusion :)
msg409921 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-01-06 20:32
The dead store notices for all the DISPATCH calls in ceval.c are false positives.  The "oparg" value is used in many of the case statements.

The dead store notices the clinic generated code all relate to "!--noptargs" which is sometimes used in generated code and sometimes not.  By always making the assignment, the clinic generator code is made less stateful (I would say "simpler" but that part of clinic.py is a mess).
msg409922 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-01-06 20:42
#511 and #512 are false positives.  The "kind" variable is indeed uninitialized in the bytes_writer case:

    else if (bytes_writer) {
        *bytes_str = _PyBytesWriter_Prepare(bytes_writer, *bytes_str, strlen);
        if (*bytes_str == NULL) {
            Py_DECREF(scratch);
            return -1;
        }
    }

However, the flagged sections cannot be called in the bytes_writer case:

    if (bytes_writer) {                  <=="kind" not used here
        char *p = *bytes_str + strlen;
        WRITE_DIGITS(p);
        assert(p == *bytes_str);
    }
    else if (kind == PyUnicode_1BYTE_KIND) {  <== bytes_write is false
        Py_UCS1 *p;
        WRITE_UNICODE_DIGITS(Py_UCS1);
    }
    else if (kind == PyUnicode_2BYTE_KIND) {  <== bytes_write is false
        Py_UCS2 *p;
        WRITE_UNICODE_DIGITS(Py_UCS2);
    }
    else {
        Py_UCS4 *p;
        assert (kind == PyUnicode_4BYTE_KIND);
        WRITE_UNICODE_DIGITS(Py_UCS4);
    }
msg409923 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-01-06 20:49
#533, #534, #535, and #536 are false positives for the same reason as #511 and #512.  The two "dead stores" in 533 and 534 match the "uninitialized variables" in 535 and 536.
msg409925 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-01-06 21:02
The CPython source code is irregularly scanned by different code analysis tools. The results shown extremely high quality of code in comparison with other open source and proprietary code. Most of reports are false positive. Last time real bugs (2 or 3) was discovered by tools several years ago, and one of these bugs was already known and did have a patch on review.

So while new tools can discover new bugs (unnoticed by previous scans or recently added), it is expected that most or all reports be false positive.
msg409926 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-01-06 21:03
#584 and #585: The code is correct but there are dead stores only when the asserts are turned off:

  2635.     carry = v_lshift(w->ob_digit, w1->ob_digit, size_w, d);
  2636.     assert(carry == 0);

Elsewhere we use ifdefs around code like this to suppress warnings about unused variables.  In this case though, it would be messy and cause code duplication.
msg409927 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-01-06 21:06
#606 is similar to #584 and #585.  The "dead store" is used only in an assertion:

            have_dict = 1;    <== Presumed dead store
        }
        assert(have_dict);    <== Used in an assert

In the case, it would be reasonable to add an #ifdef.
msg409929 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-01-06 21:12
#632 should be left as is.  Technically, the second "field++" is a dead store.  However, it is harmless and has some advantages.  It keeps the the assignments parallel and it reduces the chance of introducing a new bug if a new field is added (i.e. like one of the reasons we use trailing commas on multiline dicts and lists).

    PyStructSequence_SET_ITEM(int_info, field++,
                              PyLong_FromLong(PyLong_SHIFT));
    PyStructSequence_SET_ITEM(int_info, field++,
                              PyLong_FromLong(sizeof(digit)));
msg409930 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-01-06 21:21
#654 is a false positive.  The value of ptrs[0] is initialized to NULL via a pointer alias a few lines before:

    pp = ptrs;
    ...
    *pp = NULL;
    ...
    if (ptrs[0] == NULL)
msg409933 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-01-06 22:05
Update on #324 and #325.  Not only are these false positives, but Serhiy pointed-out the existing logic is intentional and should not be rewritten.
msg410539 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-01-14 01:54
#389
Modules/_tracemalloc.c:1245: error: Null Dereference
  pointer `traces2` last assigned on line 1243 could be null and is dereferenced by call to `_Py_hashtable_destroy()` at line 1245, column 9.
  1243.     _Py_hashtable_t *traces2 = tracemalloc_copy_traces(traces);
  1244.     if (_Py_hashtable_set(domains2, TO_PTR(domain), traces2) < 0) {
  1245.         _Py_hashtable_destroy(traces2);
                ^
  1246.         return -1;
  1247.     }

That's a real bug: I wrote PR #30591 to fix it.


Whereas the following one must be ignored, since the function does crash (read at NULL) on purpose:

#360
Modules/faulthandler.c:1025: error: Null Dereference
  pointer `x` last assigned on line 1024 could be null and is dereferenced at line 1025, column 9.
  1023.     faulthandler_suppress_crash_report();
  1024.     x = NULL;
  1025.     y = *x;
                ^
  1026.     return PyLong_FromLong(y);
  1027.
msg410540 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-01-14 04:11
New changeset 7c770d3350813a82a639fcb3babae0de2b87aaae by Victor Stinner in branch 'main':
bpo-46280: Fix tracemalloc_copy_domain() (GH-30591)
https://github.com/python/cpython/commit/7c770d3350813a82a639fcb3babae0de2b87aaae
msg410541 - (view) Author: miss-islington (miss-islington) Date: 2022-01-14 04:32
New changeset 86d18019e96167c5ab6f5157fa90598202849904 by Miss Islington (bot) in branch '3.10':
bpo-46280: Fix tracemalloc_copy_domain() (GH-30591)
https://github.com/python/cpython/commit/86d18019e96167c5ab6f5157fa90598202849904
msg410542 - (view) Author: miss-islington (miss-islington) Date: 2022-01-14 04:35
New changeset ae6e255cb362557ff713ff2967aecb92f7eb069c by Miss Islington (bot) in branch '3.9':
bpo-46280: Fix tracemalloc_copy_domain() (GH-30591)
https://github.com/python/cpython/commit/ae6e255cb362557ff713ff2967aecb92f7eb069c
History
Date User Action Args
2022-01-14 04:35:25miss-islingtonsetmessages: + msg410542
2022-01-14 04:32:46miss-islingtonsetmessages: + msg410541
2022-01-14 04:11:55miss-islingtonsetpull_requests: + pull_request28791
2022-01-14 04:11:50miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request28790
2022-01-14 04:11:46vstinnersetmessages: + msg410540
2022-01-14 01:54:53vstinnersetmessages: + msg410539
2022-01-14 01:53:40vstinnersetnosy: + vstinner
pull_requests: + pull_request28789
2022-01-06 22:05:11rhettingersetmessages: + msg409933
2022-01-06 21:29:59rhettingersetkeywords: + patch
stage: patch review
pull_requests: + pull_request28651
2022-01-06 21:21:02rhettingersetmessages: + msg409930
2022-01-06 21:12:31rhettingersetmessages: + msg409929
2022-01-06 21:06:47rhettingersetmessages: + msg409927
2022-01-06 21:03:05rhettingersetmessages: + msg409926
2022-01-06 21:02:31serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg409925
2022-01-06 20:49:27rhettingersetmessages: + msg409923
2022-01-06 20:42:36rhettingersetmessages: + msg409922
2022-01-06 20:32:56rhettingersetmessages: + msg409921
2022-01-06 20:24:04pablogsalsetnosy: - urnotmax
messages: + msg409918
2022-01-06 20:16:07zach.waresetnosy: + urnotmax, - zach.ware
2022-01-06 20:14:55terry.reedysetnosy: + terry.reedy

messages: + msg409912
versions: + Python 3.11
2022-01-06 20:13:40rhettingersetmessages: + msg409911
2022-01-06 20:08:57rhettingersetmessages: + msg409909
2022-01-06 19:50:14rhettingersetmessages: + msg409907
2022-01-06 19:36:52eric.smithsetmessages: + msg409904
2022-01-06 19:23:38rhettingersetnosy: + rhettinger
messages: + msg409902
2022-01-06 18:41:10pablogsalsetmessages: + msg409895
2022-01-06 17:17:47nanjekyejoannahsetnosy: + nanjekyejoannah, - urnotmax
messages: + msg409884
2022-01-06 15:57:07zach.waresetnosy: + zach.ware
messages: + msg409868
2022-01-06 15:54:10zach.waresetnosy: + urnotmax
2022-01-06 15:17:25pablogsalsetnosy: + pablogsal
messages: + msg409859
2022-01-06 14:52:25eric.smithsetnosy: + eric.smith, - urnotmax
messages: + msg409854
2022-01-06 13:07:52urnotmaxcreate