New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About vulnerabilities in Cpython native code #90438
Comments
I am currently doing some research on the security of CPython. I used the open source vulnerability analysis engine, Infer(https://fbinfer.com/), to scan the native code of CPython 3.10.0. The scan results show that there are still a number of vulnerabilities in the CPython native code, such as Null dereference, Uninitialized variable, Resource/Memory leak, etc. Moreover, I found that some of the vulnerabilities are related to Python/C API. I enclose the vulnerability report for your reference. Based on the research of the result, I tried to design a tool to automatically detect and repair vulnerabilities in CPython and make this tool available. See: https://github.com/PVMPATCH/PVMPatch Python is my favourite programming language. I sincerely hope that I can help Python become stronger and safer. I hope this discovery can be useful for you to develop Python in the future. |
Thank you for posting this. Some of these look like false positives. For example: I don't see how this could be an uninitialized read, although I'm willing to be wrong. If your tool can produce patches to fix reported problems, I suggest that you create PRs for specific issues, so they can be reviewed individually. There's no way we'd review a single patch for all 673 issues that were identified. Also, looking at the first one: We've discussed this before. The consensus last time was to leave code like this in place, in case other code was added after this that refers to the same pointer. Our assumption is that compilers will remove the unneeded store. Is it possible to remove Dead Stores from the output, and/or produce a separate output with just Dead Stores? I don't see how a Dead Store can be a vulnerability. |
You mention here that your tool automatically "repairs" the code. Could you submit a sample PR with the repairs that your tool does so we can evaluate it? |
As an aside, there's an issue with Roundup where a username composed of all digits causes it to think that name is a user ID in the nosy list. I recommend changing your username to include a non-digit character so that others can interact with you on your issues without constantly removing you from the nosy list. |
I second this. I obviously, am assuming good intentions from the author, not experimenting on us. |
It can be uninitialized if the parenstack[nested_depth] value is itself initialized, which can happen if the memory block pointed by parenstack has not been initialized to some value after malloc'ed and parenstack[nested_depth] never got a value. But yeah, a lot of hypotheticals here that I am sure are not possible in the actual code. |
#244 is a false positive. The value of new_state[i] on line 454 was initialized on line 442 with: new_state[i] = (uint32_t)element. #387 is also a false positive. There is an assertion on the previous line that the item != NULL. That assertion passes because item is the result of a call to deque_popleft() which only returns NULL when the size is zero; however, deque_del_item() is not ever called with size 0 and it has an assertion to that effect. The two callers are deque_remove() and deque_ass_item(). Both calls are guarded by regular tests (not assertions) so that they are only called when the index >= 0 and index < len(deque). |
I don't want to belabor this, but hey, it's in f-strings! And if it's an actual problem I'd like to fix it.
parenstack is allocated on the stack, not that that changes the discussion much. It looks like everywhere that parenstack[nested_depth] is read, it's already been written to. Anyway, (if I'm right,) this makes my points that a) there are false positives, and b) we should have separate issues for each actual problem. |
#382 is false positive. The "iterable" variable is only accessed when known to not be NULL. # Line 970
if (iterable != NULL) {
if (set_update_internal(so, iterable)) {
Py_DECREF(so);
return NULL;
}
} |
Last I knew, CPython C code is a) regularly scanned by Valgrind and b) Valgrind is somehow informed as to false positives to not report. But I know none of the details. So I suggest you look into this and how to not report the same false positives. I suggest working with the 'main' branch (future 3.11) as nearly all patches are applied there first and backported as appropriate. Infer is a facebook github project, used by by other big corps and projects, so it may be worthwhile working with if false positives can be suppressed. |
Sorry Eric, I failed to clarify my comment: you are absolutely right in your analysis. I was trying to backtrack what the tool is thinking and how that code could result in an initialized read based only on static analysis. Your analysis is right and these are indeed false positives. Apologies for the confusion :) |
The dead store notices for all the DISPATCH calls in ceval.c are false positives. The "oparg" value is used in many of the case statements. The dead store notices the clinic generated code all relate to "!--noptargs" which is sometimes used in generated code and sometimes not. By always making the assignment, the clinic generator code is made less stateful (I would say "simpler" but that part of clinic.py is a mess). |
#511 and #512 are false positives. The "kind" variable is indeed uninitialized in the bytes_writer case: else if (bytes_writer) {
*bytes_str = _PyBytesWriter_Prepare(bytes_writer, *bytes_str, strlen);
if (*bytes_str == NULL) {
Py_DECREF(scratch);
return -1;
}
} However, the flagged sections cannot be called in the bytes_writer case: if (bytes_writer) { <=="kind" not used here
char *p = *bytes_str + strlen;
WRITE_DIGITS(p);
assert(p == *bytes_str);
}
else if (kind == PyUnicode_1BYTE_KIND) { <== bytes_write is false
Py_UCS1 *p;
WRITE_UNICODE_DIGITS(Py_UCS1);
}
else if (kind == PyUnicode_2BYTE_KIND) { <== bytes_write is false
Py_UCS2 *p;
WRITE_UNICODE_DIGITS(Py_UCS2);
}
else {
Py_UCS4 *p;
assert (kind == PyUnicode_4BYTE_KIND);
WRITE_UNICODE_DIGITS(Py_UCS4);
} |
The CPython source code is irregularly scanned by different code analysis tools. The results shown extremely high quality of code in comparison with other open source and proprietary code. Most of reports are false positive. Last time real bugs (2 or 3) was discovered by tools several years ago, and one of these bugs was already known and did have a patch on review. So while new tools can discover new bugs (unnoticed by previous scans or recently added), it is expected that most or all reports be false positive. |
#584 and #585: The code is correct but there are dead stores only when the asserts are turned off:
Elsewhere we use ifdefs around code like this to suppress warnings about unused variables. In this case though, it would be messy and cause code duplication. |
#632 should be left as is. Technically, the second "field++" is a dead store. However, it is harmless and has some advantages. It keeps the the assignments parallel and it reduces the chance of introducing a new bug if a new field is added (i.e. like one of the reasons we use trailing commas on multiline dicts and lists). PyStructSequence_SET_ITEM(int_info, field++,
PyLong_FromLong(PyLong_SHIFT));
PyStructSequence_SET_ITEM(int_info, field++,
PyLong_FromLong(sizeof(digit))); |
#654 is a false positive. The value of ptrs[0] is initialized to NULL via a pointer alias a few lines before: pp = ptrs;
...
*pp = NULL;
...
if (ptrs[0] == NULL) |
#389 That's a real bug: I wrote PR bpo-30591 to fix it. Whereas the following one must be ignored, since the function does crash (read at NULL) on purpose: #360 |
any question about this issue? if not, i think it is better to close it |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: