This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: elementtree calls PyDict_GetItem without owning a reference to the dict
Type: Stage: resolved
Components: Extension Modules Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Tim Mitchell, corona10, iritkatriel, miss-islington, serhiy.storchaka, tehybel
Priority: normal Keywords: patch

Created on 2016-09-02 21:48 by tehybel, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
0001-issue27946-Hold-reference-to-dict-in-PyDict_GetItem.patch Tim Mitchell, 2016-09-13 00:15
Pull Requests
URL Status Linked Edit
PR 29915 merged serhiy.storchaka, 2021-12-04 18:45
PR 29919 merged miss-islington, 2021-12-05 12:23
PR 29920 merged miss-islington, 2021-12-05 12:23
Messages (8)
msg274276 - (view) Author: tehybel (tehybel) Date: 2016-09-02 21:48
I would like to describe an issue in the _elementtree module, and then
propose a fix which would prevent this type of bug everywhere in the
codebase.

The issue exists in _elementtree_Element_get_impl in
/Modules/_elementtree.c. Here is the code:

    static PyObject *
    _elementtree_Element_get_impl(ElementObject *self, PyObject *key,
                                  PyObject *default_value)
    {
        PyObject* value;

        if (...)
            value = default_value;
        else {
            value = PyDict_GetItem(self->extra->attrib, key);
            ...
        }
        ...
    }

We look up "key" in the dictionary, that is, in self->extra->attrib. This is 
done with the call to PyDict_GetItem(self->extra->attrib, key).

We need to hash the "key" object in order to find it in the dictionary. This
sometimes requires calling the key's __hash__ function, i.e., it might call on
to python code.

What happens if the python code gets the dictionary (self->extra->attrib) freed?
Then PyDict_GetItem will use it after it has been freed.

I've attached a proof-of-concept script which causes a use-after-free through
_elementtree_Element_get_impl due to this issue.

We could fix this by calling Py_INCREF on self->extra->attrib before calling
PyDict_GetItem. But grepping for "PyDict_GetItem.*\->" shows that there are many
places in the codebase where this situation occurs. Some of these may not have
much impact, but some of them likely will.

All these bugs could be fixed at once by changing PyDict_GetItem to call 
Py_INCREF on the dictionary before it hashes the key.



Here's a PoC for the _elementtree module.

--- begin script ---

import _elementtree as et

state = {
    "tag": "tag",
    "_children": [],
    "attrib": "attr",
    "tail": "tail",
    "text": "text",
}

class X:
    def __hash__(self):
        e.__setstate__(state) # this frees e->extra->attrib
        return 13

e = et.Element("elem", {1: 2})
e.get(X())

--- end script ---

Running it (64-bits Python 3.5.2, --with-pydebug) causes a use-after-free with
control of the program counter:

(gdb) r ./poc13.py
Starting program: /home/xx/Python-3.5.2/python ./poc13.py

Program received signal SIGSEGV, Segmentation fault.
0x00000000004939af in PyDict_GetItem (op=0x7ffff6d5b1a8, key=0x7ffff6d528e0) at Objects/dictobject.c:1079
1079            ep = (mp->ma_keys->dk_lookup)(mp, key, hash, &value_addr);
(gdb) p mp->ma_keys->dk_lookup
$8 = (dict_lookup_func) 0x7b7b7b7b7b7b7b7b
msg276128 - (view) Author: Tim Mitchell (Tim Mitchell) * Date: 2016-09-12 22:21
Verified problem.  Added patch to keep reference to dict in PyDict_GetItem.
msg276140 - (view) Author: Tim Mitchell (Tim Mitchell) * Date: 2016-09-13 00:17
Added test to patch
msg407553 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-12-02 22:54
This is a bug in elementtree - the caller should own a reference to the dict during the entire PyDict_GetItem call.

We won't add the refcount change you propose because it's not free and this is a hot function. 

The test in your patch doesn't fail for me (I tried on version 3.11 on a Mac). Are you still seeing the problem on version >= 3.9, or perhaps it was fixed in eaementtree by now?
msg407582 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-12-03 12:45
Concur with Irit. We should patch the caller of PyDict_GetItem, not PyDict_GetItem.
msg407681 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-12-04 18:55
The issue is no longer reproduced by the original test because of the cache for dict key tables. But it is not gone, and can be reproduced with modified test.

There may be many similar bugs in the Python core end extensions. Adding incref/decref in PyDict_GetItem and similar C API functions could fix many of them, but perhaps not all, and it would hit performance. I suppose modt of uses of PyDict_GetItem are safe.
msg407705 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-12-05 12:23
New changeset d15cdb2f32f572ce56d7120135da24b9fdce4c99 by Serhiy Storchaka in branch 'main':
bpo-27946: Fix possible crash in ElementTree.Element (GH-29915)
https://github.com/python/cpython/commit/d15cdb2f32f572ce56d7120135da24b9fdce4c99
msg407730 - (view) Author: miss-islington (miss-islington) Date: 2021-12-05 19:05
New changeset beb834292db54fea129dd073cc822179430cee52 by Miss Islington (bot) in branch '3.10':
bpo-27946: Fix possible crash in ElementTree.Element (GH-29915)
https://github.com/python/cpython/commit/beb834292db54fea129dd073cc822179430cee52
msg407731 - (view) Author: miss-islington (miss-islington) Date: 2021-12-05 19:05
New changeset 52a9a71fe682e47f6c78a9c34aa9a797ca632c86 by Miss Islington (bot) in branch '3.9':
bpo-27946: Fix possible crash in ElementTree.Element (GH-29915)
https://github.com/python/cpython/commit/52a9a71fe682e47f6c78a9c34aa9a797ca632c86
History
Date User Action Args
2022-04-11 14:58:35adminsetgithub: 72133
2021-12-05 19:07:13serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-12-05 19:05:06miss-islingtonsetmessages: + msg407731
2021-12-05 19:05:06miss-islingtonsetmessages: + msg407730
2021-12-05 12:23:29miss-islingtonsetpull_requests: + pull_request28144
2021-12-05 12:23:24serhiy.storchakasetmessages: + msg407705
2021-12-05 12:23:24miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request28143
2021-12-04 18:55:25serhiy.storchakasetmessages: + msg407681
versions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.5, Python 3.6
2021-12-04 18:45:21serhiy.storchakasetstage: patch review
pull_requests: + pull_request28140
2021-12-04 15:18:36corona10setnosy: + corona10
2021-12-03 12:45:17serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg407582
2021-12-02 22:54:45iritkatrielsetnosy: + iritkatriel

messages: + msg407553
title: issues in elementtree and elsewhere due to PyDict_GetItem -> elementtree calls PyDict_GetItem without owning a reference to the dict
2016-09-13 00:17:57Tim Mitchellsetmessages: + msg276140
2016-09-13 00:16:38Tim Mitchellsetfiles: - 0001-issue27946-Hold-reference-to-dict-in-PyDict_GetItem.patch
2016-09-13 00:15:54Tim Mitchellsetfiles: + 0001-issue27946-Hold-reference-to-dict-in-PyDict_GetItem.patch
2016-09-12 22:21:49Tim Mitchellsetfiles: + 0001-issue27946-Hold-reference-to-dict-in-PyDict_GetItem.patch

nosy: + Tim Mitchell
messages: + msg276128

keywords: + patch
2016-09-02 21:48:01tehybelcreate