msg409841 - (view) |
Author: Lin (urnotmax) * |
Date: 2022-01-06 13:07 |
I am currently doing some research on the security of CPython. I used the open source vulnerability analysis engine, Infer(https://fbinfer.com/), to scan the native code of CPython 3.10.0.
The scan results show that there are still a number of vulnerabilities in the CPython native code, such as Null dereference, Uninitialized variable, Resource/Memory leak, etc. Moreover, I found that some of the vulnerabilities are related to Python/C API. I enclose the vulnerability report for your reference.
Based on the research of the result, I tried to design a tool to automatically detect and repair vulnerabilities in CPython and make this tool available. See:
https://github.com/PVMPATCH/PVMPatch
Python is my favourite programming language. I sincerely hope that I can help Python become stronger and safer. I hope this discovery can be useful for you to develop Python in the future.
|
msg409854 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2022-01-06 14:52 |
Thank you for posting this.
Some of these look like false positives.
For example:
#263
Parser/string_parser.c:670: error: Uninitialized Value
The value read from parenstack[_] was never initialized.
668. }
669. nested_depth--;
670. int opening = (unsigned char)parenstack[nested_depth];
^
671. if (!((opening == '(' && ch == ')') ||
672. (opening == '[' && ch == ']') ||
I don't see how this could be an uninitialized read, although I'm willing to be wrong.
If your tool can produce patches to fix reported problems, I suggest that you create PRs for specific issues, so they can be reviewed individually. There's no way we'd review a single patch for all 673 issues that were identified.
Also, looking at the first one:
#0
Objects/clinic/bytearrayobject.c.h:50: error: Dead Store
The value written to &noptargs (type long) is never used.
48. goto exit;
49. }
50. if (!--noptargs) {
^
51. goto skip_optional_pos;
52. }
We've discussed this before. The consensus last time was to leave code like this in place, in case other code was added after this that refers to the same pointer. Our assumption is that compilers will remove the unneeded store. Is it possible to remove Dead Stores from the output, and/or produce a separate output with just Dead Stores? I don't see how a Dead Store can be a vulnerability.
|
msg409859 - (view) |
Author: Pablo Galindo Salgado (pablogsal) * |
Date: 2022-01-06 15:17 |
> Based on the research of the result, I tried to design a tool to automatically detect and repair vulnerabilities in CPython and make this tool available. See:
You mention here that your tool automatically "repairs" the code. Could you submit a sample PR with the repairs that your tool does so we can evaluate it?
|
msg409868 - (view) |
Author: Zachary Ware (zach.ware) * |
Date: 2022-01-06 15:57 |
As an aside, there's an issue with Roundup where a username composed of all digits causes it to think that name is a user ID in the nosy list. I recommend changing your username to include a non-digit character so that others can interact with you on your issues without constantly removing you from the nosy list.
|
msg409884 - (view) |
Author: Joannah Nanjekye (nanjekyejoannah) * |
Date: 2022-01-06 17:17 |
> You mention here that your tool automatically "repairs" the code. Could you >submit a sample PR with the repairs that your tool does so we can evaluate it?
I second this.
I obviously, am assuming good intentions from the author, not experimenting on us.
|
msg409895 - (view) |
Author: Pablo Galindo Salgado (pablogsal) * |
Date: 2022-01-06 18:41 |
> I don't see how this could be an uninitialized read, although I'm willing to be wrong.
It can be uninitialized if the parenstack[nested_depth] value is itself initialized, which can happen if the memory block pointed by parenstack has not been initialized to some value after malloc'ed and parenstack[nested_depth] never got a value.
But yeah, a lot of hypotheticals here that I am sure are not possible in the actual code.
|
msg409902 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2022-01-06 19:23 |
#244 is a false positive. The value of new_state[i] on line 454 was initialized on line 442 with: new_state[i] = (uint32_t)element.
#387 is also a false positive. There is an assertion on the previous line that the item != NULL. That assertion passes because item is the result of a call to deque_popleft() which only returns NULL when the size is zero; however, deque_del_item() is not ever called with size 0 and it has an assertion to that effect. The two callers are deque_remove() and deque_ass_item(). Both calls are guarded by regular tests (not assertions) so that they are only called when the index >= 0 and index < len(deque).
|
msg409904 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2022-01-06 19:36 |
I don't want to belabor this, but hey, it's in f-strings! And if it's an actual problem I'd like to fix it.
> It can be uninitialized if the parenstack[nested_depth] value is itself initialized, which can happen if the memory block pointed by parenstack has not been initialized to some value after malloc'ed and parenstack[nested_depth] never got a value.
parenstack is allocated on the stack, not that that changes the discussion much.
It looks like everywhere that parenstack[nested_depth] is read, it's already been written to.
Anyway, (if I'm right,) this makes my points that a) there are false positives, and b) we should have separate issues for each actual problem.
|
msg409907 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2022-01-06 19:50 |
#324 and #325 are false positives. The result variable is initialized in the preceding lines:
if (len_a == length) {
left = *((volatile const unsigned char**)&a);
result = 0;
}
if (len_a != length) {
left = b;
result = 1;
}
While the code is correct, the second test should be changed to "else".
|
msg409909 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2022-01-06 20:08 |
#382 is false positive. The "iterable" variable is only accessed when known to not be NULL.
# Line 970
if (iterable != NULL) {
if (set_update_internal(so, iterable)) {
Py_DECREF(so);
return NULL;
}
}
|
msg409911 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2022-01-06 20:13 |
#420 and #421 are false positives. The value of "c" is initialized a few lines before use.
for (;;) {
c = tok_nextc(tok);
...
}
...
tok_backup(tok, c);
if (c == '#' || c == '\n' || c == '\\') {
...
}
|
msg409912 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2022-01-06 20:14 |
Last I knew, CPython C code is a) regularly scanned by Valgrind and b) Valgrind is somehow informed as to false positives to not report. But I know none of the details. So I suggest you look into this and how to not report the same false positives.
I suggest working with the 'main' branch (future 3.11) as nearly all patches are applied there first and backported as appropriate.
Infer is a facebook github project, used by by other big corps and projects, so it may be worthwhile working with if false positives can be suppressed.
|
msg409918 - (view) |
Author: Pablo Galindo Salgado (pablogsal) * |
Date: 2022-01-06 20:24 |
> Anyway, (if I'm right,) this makes my points that a) there are false positives, and b) we should have separate issues for each actual problem.
Sorry Eric, I failed to clarify my comment: you are absolutely right in your analysis. I was trying to backtrack what the tool is thinking and how that code could result in an initialized read based only on static analysis.
Your analysis is right and these are indeed false positives. Apologies for the confusion :)
|
msg409921 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2022-01-06 20:32 |
The dead store notices for all the DISPATCH calls in ceval.c are false positives. The "oparg" value is used in many of the case statements.
The dead store notices the clinic generated code all relate to "!--noptargs" which is sometimes used in generated code and sometimes not. By always making the assignment, the clinic generator code is made less stateful (I would say "simpler" but that part of clinic.py is a mess).
|
msg409922 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2022-01-06 20:42 |
#511 and #512 are false positives. The "kind" variable is indeed uninitialized in the bytes_writer case:
else if (bytes_writer) {
*bytes_str = _PyBytesWriter_Prepare(bytes_writer, *bytes_str, strlen);
if (*bytes_str == NULL) {
Py_DECREF(scratch);
return -1;
}
}
However, the flagged sections cannot be called in the bytes_writer case:
if (bytes_writer) { <=="kind" not used here
char *p = *bytes_str + strlen;
WRITE_DIGITS(p);
assert(p == *bytes_str);
}
else if (kind == PyUnicode_1BYTE_KIND) { <== bytes_write is false
Py_UCS1 *p;
WRITE_UNICODE_DIGITS(Py_UCS1);
}
else if (kind == PyUnicode_2BYTE_KIND) { <== bytes_write is false
Py_UCS2 *p;
WRITE_UNICODE_DIGITS(Py_UCS2);
}
else {
Py_UCS4 *p;
assert (kind == PyUnicode_4BYTE_KIND);
WRITE_UNICODE_DIGITS(Py_UCS4);
}
|
msg409923 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2022-01-06 20:49 |
#533, #534, #535, and #536 are false positives for the same reason as #511 and #512. The two "dead stores" in 533 and 534 match the "uninitialized variables" in 535 and 536.
|
msg409925 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2022-01-06 21:02 |
The CPython source code is irregularly scanned by different code analysis tools. The results shown extremely high quality of code in comparison with other open source and proprietary code. Most of reports are false positive. Last time real bugs (2 or 3) was discovered by tools several years ago, and one of these bugs was already known and did have a patch on review.
So while new tools can discover new bugs (unnoticed by previous scans or recently added), it is expected that most or all reports be false positive.
|
msg409926 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2022-01-06 21:03 |
#584 and #585: The code is correct but there are dead stores only when the asserts are turned off:
2635. carry = v_lshift(w->ob_digit, w1->ob_digit, size_w, d);
2636. assert(carry == 0);
Elsewhere we use ifdefs around code like this to suppress warnings about unused variables. In this case though, it would be messy and cause code duplication.
|
msg409927 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2022-01-06 21:06 |
#606 is similar to #584 and #585. The "dead store" is used only in an assertion:
have_dict = 1; <== Presumed dead store
}
assert(have_dict); <== Used in an assert
In the case, it would be reasonable to add an #ifdef.
|
msg409929 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2022-01-06 21:12 |
#632 should be left as is. Technically, the second "field++" is a dead store. However, it is harmless and has some advantages. It keeps the the assignments parallel and it reduces the chance of introducing a new bug if a new field is added (i.e. like one of the reasons we use trailing commas on multiline dicts and lists).
PyStructSequence_SET_ITEM(int_info, field++,
PyLong_FromLong(PyLong_SHIFT));
PyStructSequence_SET_ITEM(int_info, field++,
PyLong_FromLong(sizeof(digit)));
|
msg409930 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2022-01-06 21:21 |
#654 is a false positive. The value of ptrs[0] is initialized to NULL via a pointer alias a few lines before:
pp = ptrs;
...
*pp = NULL;
...
if (ptrs[0] == NULL)
|
msg409933 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2022-01-06 22:05 |
Update on #324 and #325. Not only are these false positives, but Serhiy pointed-out the existing logic is intentional and should not be rewritten.
|
msg410539 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2022-01-14 01:54 |
#389
Modules/_tracemalloc.c:1245: error: Null Dereference
pointer `traces2` last assigned on line 1243 could be null and is dereferenced by call to `_Py_hashtable_destroy()` at line 1245, column 9.
1243. _Py_hashtable_t *traces2 = tracemalloc_copy_traces(traces);
1244. if (_Py_hashtable_set(domains2, TO_PTR(domain), traces2) < 0) {
1245. _Py_hashtable_destroy(traces2);
^
1246. return -1;
1247. }
That's a real bug: I wrote PR #30591 to fix it.
Whereas the following one must be ignored, since the function does crash (read at NULL) on purpose:
#360
Modules/faulthandler.c:1025: error: Null Dereference
pointer `x` last assigned on line 1024 could be null and is dereferenced at line 1025, column 9.
1023. faulthandler_suppress_crash_report();
1024. x = NULL;
1025. y = *x;
^
1026. return PyLong_FromLong(y);
1027.
|
msg410540 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2022-01-14 04:11 |
New changeset 7c770d3350813a82a639fcb3babae0de2b87aaae by Victor Stinner in branch 'main':
bpo-46280: Fix tracemalloc_copy_domain() (GH-30591)
https://github.com/python/cpython/commit/7c770d3350813a82a639fcb3babae0de2b87aaae
|
msg410541 - (view) |
Author: miss-islington (miss-islington) |
Date: 2022-01-14 04:32 |
New changeset 86d18019e96167c5ab6f5157fa90598202849904 by Miss Islington (bot) in branch '3.10':
bpo-46280: Fix tracemalloc_copy_domain() (GH-30591)
https://github.com/python/cpython/commit/86d18019e96167c5ab6f5157fa90598202849904
|
msg410542 - (view) |
Author: miss-islington (miss-islington) |
Date: 2022-01-14 04:35 |
New changeset ae6e255cb362557ff713ff2967aecb92f7eb069c by Miss Islington (bot) in branch '3.9':
bpo-46280: Fix tracemalloc_copy_domain() (GH-30591)
https://github.com/python/cpython/commit/ae6e255cb362557ff713ff2967aecb92f7eb069c
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:54 | admin | set | github: 90438 |
2022-01-14 04:35:25 | miss-islington | set | messages:
+ msg410542 |
2022-01-14 04:32:46 | miss-islington | set | messages:
+ msg410541 |
2022-01-14 04:11:55 | miss-islington | set | pull_requests:
+ pull_request28791 |
2022-01-14 04:11:50 | miss-islington | set | nosy:
+ miss-islington pull_requests:
+ pull_request28790
|
2022-01-14 04:11:46 | vstinner | set | messages:
+ msg410540 |
2022-01-14 01:54:53 | vstinner | set | messages:
+ msg410539 |
2022-01-14 01:53:40 | vstinner | set | nosy:
+ vstinner pull_requests:
+ pull_request28789
|
2022-01-06 22:05:11 | rhettinger | set | messages:
+ msg409933 |
2022-01-06 21:29:59 | rhettinger | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request28651 |
2022-01-06 21:21:02 | rhettinger | set | messages:
+ msg409930 |
2022-01-06 21:12:31 | rhettinger | set | messages:
+ msg409929 |
2022-01-06 21:06:47 | rhettinger | set | messages:
+ msg409927 |
2022-01-06 21:03:05 | rhettinger | set | messages:
+ msg409926 |
2022-01-06 21:02:31 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg409925
|
2022-01-06 20:49:27 | rhettinger | set | messages:
+ msg409923 |
2022-01-06 20:42:36 | rhettinger | set | messages:
+ msg409922 |
2022-01-06 20:32:56 | rhettinger | set | messages:
+ msg409921 |
2022-01-06 20:24:04 | pablogsal | set | nosy:
- urnotmax messages:
+ msg409918
|
2022-01-06 20:16:07 | zach.ware | set | nosy:
+ urnotmax, - zach.ware |
2022-01-06 20:14:55 | terry.reedy | set | nosy:
+ terry.reedy
messages:
+ msg409912 versions:
+ Python 3.11 |
2022-01-06 20:13:40 | rhettinger | set | messages:
+ msg409911 |
2022-01-06 20:08:57 | rhettinger | set | messages:
+ msg409909 |
2022-01-06 19:50:14 | rhettinger | set | messages:
+ msg409907 |
2022-01-06 19:36:52 | eric.smith | set | messages:
+ msg409904 |
2022-01-06 19:23:38 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg409902
|
2022-01-06 18:41:10 | pablogsal | set | messages:
+ msg409895 |
2022-01-06 17:17:47 | nanjekyejoannah | set | nosy:
+ nanjekyejoannah, - urnotmax messages:
+ msg409884 |
2022-01-06 15:57:07 | zach.ware | set | nosy:
+ zach.ware messages:
+ msg409868
|
2022-01-06 15:54:10 | zach.ware | set | nosy:
+ urnotmax
|
2022-01-06 15:17:25 | pablogsal | set | nosy:
+ pablogsal messages:
+ msg409859
|
2022-01-06 14:52:25 | eric.smith | set | nosy:
+ eric.smith, - urnotmax messages:
+ msg409854 |
2022-01-06 13:07:52 | urnotmax | create | |