About vulnerabilities in Cpython native code #90438

urnotmax · 2022-01-06T13:07:53Z

BPO	46280
Nosy	@rhettinger, @terryjreedy, @vstinner, @ericvsmith, @serhiy-storchaka, @pablogsal, @miss-islington, @nanjekyejoannah
PRs	bpo-46280: Minor code improvement flagged in subissues #324 and #325. #30445 bpo-46280: Fix tracemalloc_copy_domain() #30591 [3.10] bpo-46280: Fix tracemalloc_copy_domain() (GH-30591) #30592 [3.9] bpo-46280: Fix tracemalloc_copy_domain() (GH-30591) #30593
Files	CPython3.10.0_vulnerability_report.txt

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2022-01-06.13:07:52.809>
labels = ['type-security', 'build', '3.10', '3.11']
title = 'About vulnerabilities in Cpython native code'
updated_at = <Date 2022-01-14.04:35:25.101>
user = 'https://bugs.python.org/urnotmax'

bugs.python.org fields:

activity = <Date 2022-01-14.04:35:25.101>
actor = 'miss-islington'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Build']
creation = <Date 2022-01-06.13:07:52.809>
creator = 'urnotmax'
dependencies = []
files = ['50544']
hgrepos = []
issue_num = 46280
keywords = ['patch']
message_count = 26.0
messages = ['409841', '409854', '409859', '409868', '409884', '409895', '409902', '409904', '409907', '409909', '409911', '409912', '409918', '409921', '409922', '409923', '409925', '409926', '409927', '409929', '409930', '409933', '410539', '410540', '410541', '410542']
nosy_count = 8.0
nosy_names = ['rhettinger', 'terry.reedy', 'vstinner', 'eric.smith', 'serhiy.storchaka', 'pablogsal', 'miss-islington', 'nanjekyejoannah']
pr_nums = ['30445', '30591', '30592', '30593']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'security'
url = 'https://bugs.python.org/issue46280'
versions = ['Python 3.10', 'Python 3.11']

urnotmax · 2022-01-06T13:07:48Z

I am currently doing some research on the security of CPython. I used the open source vulnerability analysis engine, Infer(https://fbinfer.com/), to scan the native code of CPython 3.10.0.

The scan results show that there are still a number of vulnerabilities in the CPython native code, such as Null dereference, Uninitialized variable, Resource/Memory leak, etc. Moreover, I found that some of the vulnerabilities are related to Python/C API. I enclose the vulnerability report for your reference.

Based on the research of the result, I tried to design a tool to automatically detect and repair vulnerabilities in CPython and make this tool available. See:

https://github.com/PVMPATCH/PVMPatch

Python is my favourite programming language. I sincerely hope that I can help Python become stronger and safer. I hope this discovery can be useful for you to develop Python in the future.

ericvsmith · 2022-01-06T14:52:26Z

Thank you for posting this.

Some of these look like false positives.

For example:
#263
Parser/string_parser.c:670: error: Uninitialized Value
The value read from parenstack[_] was never initialized.
668. }
669. nested_depth--;
670. int opening = (unsigned char)parenstack[nested_depth];
^
671. if (!((opening == '(' && ch == ')') ||
672. (opening == '[' && ch == ']') ||

I don't see how this could be an uninitialized read, although I'm willing to be wrong.

If your tool can produce patches to fix reported problems, I suggest that you create PRs for specific issues, so they can be reviewed individually. There's no way we'd review a single patch for all 673 issues that were identified.

Also, looking at the first one:
#0
Objects/clinic/bytearrayobject.c.h:50: error: Dead Store
The value written to &noptargs (type long) is never used.
48. goto exit;
49. }
50. if (!--noptargs) {
^
51. goto skip_optional_pos;
52. }

We've discussed this before. The consensus last time was to leave code like this in place, in case other code was added after this that refers to the same pointer. Our assumption is that compilers will remove the unneeded store. Is it possible to remove Dead Stores from the output, and/or produce a separate output with just Dead Stores? I don't see how a Dead Store can be a vulnerability.

pablogsal · 2022-01-06T15:17:25Z

Based on the research of the result, I tried to design a tool to automatically detect and repair vulnerabilities in CPython and make this tool available. See:

You mention here that your tool automatically "repairs" the code. Could you submit a sample PR with the repairs that your tool does so we can evaluate it?

zware · 2022-01-06T15:57:07Z

As an aside, there's an issue with Roundup where a username composed of all digits causes it to think that name is a user ID in the nosy list. I recommend changing your username to include a non-digit character so that others can interact with you on your issues without constantly removing you from the nosy list.

nanjekyejoannah · 2022-01-06T17:17:47Z

You mention here that your tool automatically "repairs" the code. Could you >submit a sample PR with the repairs that your tool does so we can evaluate it?

I second this.

I obviously, am assuming good intentions from the author, not experimenting on us.

pablogsal · 2022-01-06T18:41:11Z

I don't see how this could be an uninitialized read, although I'm willing to be wrong.

It can be uninitialized if the parenstack[nested_depth] value is itself initialized, which can happen if the memory block pointed by parenstack has not been initialized to some value after malloc'ed and parenstack[nested_depth] never got a value.

But yeah, a lot of hypotheticals here that I am sure are not possible in the actual code.

rhettinger · 2022-01-06T19:23:38Z

#244 is a false positive. The value of new_state[i] on line 454 was initialized on line 442 with: new_state[i] = (uint32_t)element.

#387 is also a false positive. There is an assertion on the previous line that the item != NULL. That assertion passes because item is the result of a call to deque_popleft() which only returns NULL when the size is zero; however, deque_del_item() is not ever called with size 0 and it has an assertion to that effect. The two callers are deque_remove() and deque_ass_item(). Both calls are guarded by regular tests (not assertions) so that they are only called when the index >= 0 and index < len(deque).

ericvsmith · 2022-01-06T19:36:52Z

I don't want to belabor this, but hey, it's in f-strings! And if it's an actual problem I'd like to fix it.

It can be uninitialized if the parenstack[nested_depth] value is itself initialized, which can happen if the memory block pointed by parenstack has not been initialized to some value after malloc'ed and parenstack[nested_depth] never got a value.

parenstack is allocated on the stack, not that that changes the discussion much.

It looks like everywhere that parenstack[nested_depth] is read, it's already been written to.

Anyway, (if I'm right,) this makes my points that a) there are false positives, and b) we should have separate issues for each actual problem.

rhettinger · 2022-01-06T19:50:14Z

#324 and #325 are false positives. The result variable is initialized in the preceding lines:

    if (len_a == length) {
        left = *((volatile const unsigned char**)&a);
        result = 0;
    }
    if (len_a != length) {
        left = b;
        result = 1;
    }

While the code is correct, the second test should be changed to "else".

rhettinger · 2022-01-06T20:08:58Z

#382 is false positive. The "iterable" variable is only accessed when known to not be NULL.

    # Line 970
    if (iterable != NULL) {
        if (set_update_internal(so, iterable)) {
            Py_DECREF(so);
            return NULL;
        }
    }

rhettinger · 2022-01-06T20:13:41Z

#420 and #421 are false positives. The value of "c" is initialized a few lines before use.

        for (;;) {
            c = tok_nextc(tok);
            ...
        }
        ...
        tok_backup(tok, c);
        if (c == '#' || c == '\n' || c == '\\') {
           ...
        }

terryjreedy · 2022-01-06T20:14:55Z

Last I knew, CPython C code is a) regularly scanned by Valgrind and b) Valgrind is somehow informed as to false positives to not report. But I know none of the details. So I suggest you look into this and how to not report the same false positives.

I suggest working with the 'main' branch (future 3.11) as nearly all patches are applied there first and backported as appropriate.

Infer is a facebook github project, used by by other big corps and projects, so it may be worthwhile working with if false positives can be suppressed.

pablogsal · 2022-01-06T20:24:05Z

Anyway, (if I'm right,) this makes my points that a) there are false positives, and b) we should have separate issues for each actual problem.

Sorry Eric, I failed to clarify my comment: you are absolutely right in your analysis. I was trying to backtrack what the tool is thinking and how that code could result in an initialized read based only on static analysis.

Your analysis is right and these are indeed false positives. Apologies for the confusion :)

rhettinger · 2022-01-06T20:32:56Z

The dead store notices for all the DISPATCH calls in ceval.c are false positives. The "oparg" value is used in many of the case statements.

The dead store notices the clinic generated code all relate to "!--noptargs" which is sometimes used in generated code and sometimes not. By always making the assignment, the clinic generator code is made less stateful (I would say "simpler" but that part of clinic.py is a mess).

rhettinger · 2022-01-06T20:42:36Z

#511 and #512 are false positives. The "kind" variable is indeed uninitialized in the bytes_writer case:

    else if (bytes_writer) {
        *bytes_str = _PyBytesWriter_Prepare(bytes_writer, *bytes_str, strlen);
        if (*bytes_str == NULL) {
            Py_DECREF(scratch);
            return -1;
        }
    }

However, the flagged sections cannot be called in the bytes_writer case:

    if (bytes_writer) {                  <=="kind" not used here
        char *p = *bytes_str + strlen;
        WRITE_DIGITS(p);
        assert(p == *bytes_str);
    }
    else if (kind == PyUnicode_1BYTE_KIND) {  <== bytes_write is false
        Py_UCS1 *p;
        WRITE_UNICODE_DIGITS(Py_UCS1);
    }
    else if (kind == PyUnicode_2BYTE_KIND) {  <== bytes_write is false
        Py_UCS2 *p;
        WRITE_UNICODE_DIGITS(Py_UCS2);
    }
    else {
        Py_UCS4 *p;
        assert (kind == PyUnicode_4BYTE_KIND);
        WRITE_UNICODE_DIGITS(Py_UCS4);
    }

rhettinger · 2022-01-06T20:49:28Z

#533, #534, #535, and #536 are false positives for the same reason as #511 and #512. The two "dead stores" in 533 and 534 match the "uninitialized variables" in 535 and 536.

serhiy-storchaka · 2022-01-06T21:02:32Z

The CPython source code is irregularly scanned by different code analysis tools. The results shown extremely high quality of code in comparison with other open source and proprietary code. Most of reports are false positive. Last time real bugs (2 or 3) was discovered by tools several years ago, and one of these bugs was already known and did have a patch on review.

So while new tools can discover new bugs (unnoticed by previous scans or recently added), it is expected that most or all reports be false positive.

rhettinger · 2022-01-06T21:03:05Z

#584 and #585: The code is correct but there are dead stores only when the asserts are turned off:

carry = v_lshift(w-\>ob_digit, w1-\>ob_digit, size_w, d);

```
assert(carry == 0);
```

Elsewhere we use ifdefs around code like this to suppress warnings about unused variables. In this case though, it would be messy and cause code duplication.

rhettinger · 2022-01-06T21:06:47Z

#606 is similar to #584 and #585. The "dead store" is used only in an assertion:

            have_dict = 1;    <== Presumed dead store
        }
        assert(have_dict);    <== Used in an assert

In the case, it would be reasonable to add an #ifdef.

rhettinger · 2022-01-06T21:12:31Z

#632 should be left as is. Technically, the second "field++" is a dead store. However, it is harmless and has some advantages. It keeps the the assignments parallel and it reduces the chance of introducing a new bug if a new field is added (i.e. like one of the reasons we use trailing commas on multiline dicts and lists).

    PyStructSequence_SET_ITEM(int_info, field++,
                              PyLong_FromLong(PyLong_SHIFT));
    PyStructSequence_SET_ITEM(int_info, field++,
                              PyLong_FromLong(sizeof(digit)));

rhettinger · 2022-01-06T21:21:03Z

#654 is a false positive. The value of ptrs[0] is initialized to NULL via a pointer alias a few lines before:

    pp = ptrs;
    ...
    *pp = NULL;
    ...
    if (ptrs[0] == NULL)

rhettinger · 2022-01-06T22:05:11Z

Update on #324 and #325. Not only are these false positives, but Serhiy pointed-out the existing logic is intentional and should not be rewritten.

vstinner · 2022-01-14T01:54:53Z

#389
Modules/_tracemalloc.c:1245: error: Null Dereference
pointer traces2 last assigned on line 1243 could be null and is dereferenced by call to _Py_hashtable_destroy() at line 1245, column 9.
1243. _Py_hashtable_t *traces2 = tracemalloc_copy_traces(traces);
1244. if (_Py_hashtable_set(domains2, TO_PTR(domain), traces2) < 0) {
1245. _Py_hashtable_destroy(traces2);
^
1246. return -1;
1247. }

That's a real bug: I wrote PR bpo-30591 to fix it.

Whereas the following one must be ignored, since the function does crash (read at NULL) on purpose:

#360
Modules/faulthandler.c:1025: error: Null Dereference
pointer x last assigned on line 1024 could be null and is dereferenced at line 1025, column 9.
1023. faulthandler_suppress_crash_report();
1024. x = NULL;
1025. y = *x;
^
1026. return PyLong_FromLong(y);
1027.

vstinner · 2022-01-14T04:11:46Z

New changeset 7c770d3 by Victor Stinner in branch 'main':
bpo-46280: Fix tracemalloc_copy_domain() (GH-30591)
7c770d3

miss-islington · 2022-01-14T04:32:47Z

New changeset 86d1801 by Miss Islington (bot) in branch '3.10':
bpo-46280: Fix tracemalloc_copy_domain() (GH-30591)
86d1801

miss-islington · 2022-01-14T04:35:25Z

New changeset ae6e255 by Miss Islington (bot) in branch '3.9':
bpo-46280: Fix tracemalloc_copy_domain() (GH-30591)
ae6e255

zhuofeng6 · 2022-09-09T01:37:50Z

any question about this issue? if not, i think it is better to close it

urnotmax mannequin added 3.10 only security fixes build The build process and cross-build type-security A security issue labels Jan 6, 2022

terryjreedy added 3.11 only security fixes labels Jan 6, 2022

ezio-melotti transferred this issue from another repository Apr 10, 2022

ericvsmith closed this as completed Sep 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About vulnerabilities in Cpython native code #90438

About vulnerabilities in Cpython native code #90438

urnotmax mannequin commented Jan 6, 2022

urnotmax mannequin commented Jan 6, 2022

ericvsmith commented Jan 6, 2022

pablogsal commented Jan 6, 2022

zware commented Jan 6, 2022

nanjekyejoannah commented Jan 6, 2022

pablogsal commented Jan 6, 2022

rhettinger commented Jan 6, 2022

ericvsmith commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

terryjreedy commented Jan 6, 2022

pablogsal commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

serhiy-storchaka commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

vstinner commented Jan 14, 2022

vstinner commented Jan 14, 2022

miss-islington commented Jan 14, 2022

miss-islington commented Jan 14, 2022

zhuofeng6 commented Sep 9, 2022

About vulnerabilities in Cpython native code #90438

About vulnerabilities in Cpython native code #90438

Comments

urnotmax mannequin commented Jan 6, 2022

urnotmax mannequin commented Jan 6, 2022

ericvsmith commented Jan 6, 2022

pablogsal commented Jan 6, 2022

zware commented Jan 6, 2022

nanjekyejoannah commented Jan 6, 2022

pablogsal commented Jan 6, 2022

rhettinger commented Jan 6, 2022

ericvsmith commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

terryjreedy commented Jan 6, 2022

pablogsal commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

serhiy-storchaka commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

rhettinger commented Jan 6, 2022

vstinner commented Jan 14, 2022

vstinner commented Jan 14, 2022

miss-islington commented Jan 14, 2022

miss-islington commented Jan 14, 2022

zhuofeng6 commented Sep 9, 2022