classification
Title: Various segfaults with dict
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6, Python 3.5, Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Tim Mitchell, benjamin.peterson, christian.heimes, duaneg, ebarry, georg.brandl, inada.naoki, larry, ned.deily, rhettinger, serhiy.storchaka, tehybel
Priority: Keywords: patch

Created on 2016-09-02 21:33 by tehybel, last changed 2017-07-26 03:08 by ned.deily. This issue is now closed.

Files
File name Uploaded Description Edit
0001-Issue-27945-fix-PyDict_MergeFromSeq2-use-after-free.patch duaneg, 2016-09-12 02:21
0001-Issue-27945-fix-dictitems_contains-use-after-free.patch duaneg, 2016-09-12 02:39 review
0001-Issue-27945-fix-dictiter_iternextitem-use-after-free.patch duaneg, 2016-09-12 04:29 review
0001-Issue-27945-Fixed-segfaults-in-dict.fromkeys-when-it.patch Tim Mitchell, 2016-09-12 04:46 review
27945-dict-segv-py27.patch inada.naoki, 2016-11-16 11:46 review
27945-dict-segv-py36.patch serhiy.storchaka, 2016-11-19 19:33 review
27945-dict-segv-py37-2.patch serhiy.storchaka, 2016-11-19 21:33 review
Pull Requests
URL Status Linked Edit
PR 1657 closed serhiy.storchaka, 2017-05-18 20:08
PR 1677 merged serhiy.storchaka, 2017-05-20 09:38
PR 1678 merged serhiy.storchaka, 2017-05-20 09:43
PR 1681 merged serhiy.storchaka, 2017-05-20 15:19
PR 2248 merged serhiy.storchaka, 2017-06-16 15:00
PR 2396 merged serhiy.storchaka, 2017-06-25 17:41
Messages (32)
msg274275 - (view) Author: tehybel (tehybel) Date: 2016-09-02 21:33
Here I'll describe five distinct issues I found. Common to them all is that they
reside in the built-in dictionary object. 

Four of them are use-after-frees and one is an array-out-of-bounds indexing bug.


All of the described functions reside in /Objects/dictobject.c.


Issue 1: use-after-free when initializing a dictionary

Initialization of a dictionary happens via the function dict_init which calls
dict_update_common. From there, PyDict_MergeFromSeq2 may be called, and that is
where this issue resides.

In PyDict_MergeFromSeq2 we retrieve a sequence of size 2 with this line:

	fast = PySequence_Fast(item, "");

After checking its size, we take out a key and value:

	key = PySequence_Fast_GET_ITEM(fast, 0);
	value = PySequence_Fast_GET_ITEM(fast, 1);

Then we call PyDict_GetItem. This calls back to Python code if the key has a
__hash__ function. From there the "item" sequence could get modified, resulting
in "key" or "value" getting used after having been freed.

Here's a PoC:

---

class X:
    def __hash__(self):
        pair[:] = []
        return 13

pair = [X(), 123]
dict([pair])

---

It crashes while trying to use freed memory as a PyObject:

(gdb) run ./poc24.py 
Program received signal SIGSEGV, Segmentation fault.
0x000000000048fe25 in insertdict (mp=mp@entry=0x7ffff6d5c4b8, key=key@entry=0x7ffff6d52538, hash=0xd, 
    value=value@entry=0x8d1ac0 <small_ints+6144>) at Objects/dictobject.c:831
831	    MAINTAIN_TRACKING(mp, key, value);
(gdb) print *key
$26 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdb, 
  ob_type = 0xdbdbdbdbdbdbdbdb}




Issue 2: use-after-free in dictitems_contains

In the function dictitems_contains we call PyDict_GetItem to look up a value in
the dictionary:

	found = PyDict_GetItem((PyObject *)dv->dv_dict, key);

However this "found" variable is borrowed. We then go ahead and compare it:

	return PyObject_RichCompareBool(value, found, Py_EQ);

But PyObject_RichCompareBool could call back into Python code and e.g. release
the GIL. As a result, the dictionary may be mutated. Thus "found" could get
freed. 

Then, inside PyObject_RichCompareBool (actually in do_richcompare), the "found"
variable gets used after being freed.

PoC:

---

class X:
    def __eq__(self, other):
        d.clear()
        return NotImplemented

d = {0: set()}
(0, X()) in d.items()

---

Result:

(gdb) run ./poc25.py 
Program received signal SIGSEGV, Segmentation fault.
0x00000000004a03b6 in do_richcompare (v=v@entry=0x7ffff6d52468, w=w@entry=0x7ffff6ddf7c8, op=op@entry=0x2)
    at Objects/object.c:673
673	    if (!checked_reverse_op && (f = w->ob_type->tp_richcompare) != NULL) {
(gdb) print w->ob_type
$26 = (struct _typeobject *) 0xdbdbdbdbdbdbdbdb




Issue 3: use-after-free in dict_equal

In the function dict_equal, we call the "lookdict" function via
b->ma_keys->dk_lookup to look up a value:

	if ((b->ma_keys->dk_lookup)(b, key, ep->me_hash, &vaddr) == NULL)

This value's address is stored into the "vaddr" variable and the value is
fetched into the "bval" variable:

	bval = *vaddr;

Then we call Py_DECREF(key) which can call back into Python code. This could
release the GIL and mutate dictionary b. Therefore "bval" could become freed at
this point. We then proceed to use "bval":

	cmp = PyObject_RichCompareBool(aval, bval, Py_EQ);

This results in a use-after-free.

PoC:

---

class X():
    def __del__(self): 
        dict_b.clear()
    def __eq__(self, other):
        dict_a.clear()
        return True
    def __hash__(self): 
        return 13
        
dict_a = {X(): 0}
dict_b = {X(): X()}
dict_a == dict_b

---

Result:

(gdb) run ./poc26.py 
Program received signal SIGSEGV, Segmentation fault.
PyType_IsSubtype (a=0xdbdbdbdbdbdbdbdb, b=0x87ec60 <PyLong_Type>)
    at Objects/typeobject.c:1343
1343	    mro = a->tp_mro;
(gdb) print a
$59 = (PyTypeObject *) 0xdbdbdbdbdbdbdbdb



Issue 4: use-after-free in _PyDict_FromKeys

The function _PyDict_FromKeys takes an iterable as argument. If the iterable is
a dict, _PyDict_FromKeys loops over it like this:

	while (_PyDict_Next(iterable, &pos, &key, &oldvalue, &hash)) {
		if (insertdict(mp, key, hash, value)) {
			...
		}
	}

However if we look at the comment for PyDict_Next, we see this:

	 * CAUTION:  In general, it isn't safe to use PyDict_Next in a loop that
	 * mutates the dict.

But insertdict can call on to Python code which might mutate the dict. In that
case we perform a use-after-free of the "key" variable.

Here's a PoC:

---

class X(int):
    def __hash__(self):
        return 13 
    def __eq__(self, other):
        if len(d) > 1:
            d.clear()
        return False

d = {}
d = {X(1): 1, X(2): 2}
x = {}.fromkeys(d)

---

And the result:

(gdb) run ./poc27.py 
Program received signal SIGSEGV, Segmentation fault.
0x0000000000435122 in visit_decref (op=0x7ffff6d5ca68, data=0x0) at Modules/gcmodule.c:373
373	    if (PyObject_IS_GC(op)) {
(gdb) print *op
$115 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdb, 
  ob_type = 0xdbdbdbdbdbdbdbdb}


An almost identical issue also exists further down in the function when calling
_PySet_NextEntry. To see this crash, just change "d" to be a set in the PoC
above:

	d = set()
	d = set([X(1), X(2)])

this likewise crashes with a use-after-free.



(Note: if you grep for PyDict_Next you will find more similar cases, although
many are in obscure modules or deprecated functions. I'm not sure those are
worth fixing? E.g. here's a crasher for BaseException_setstate which also calls
PyDict_Next:

---

class X(str):
    def __hash__(self):
        d.clear()
        return 13

d = {}
d[X()] = X()

e = Exception()
e.__setstate__(d)

---

end note.)




Issue 5: out-of-bounds indexing in dictiter_iternextitem

The function dictiter_iternextitem is used to iterate over a dictionary's items.
dictiter_iternextitem is careful to check that the dictionary did not change
size during iteration. However after performing this check, it calls Py_DECREF:

	Py_DECREF(PyTuple_GET_ITEM(result, 0));
	Py_DECREF(PyTuple_GET_ITEM(result, 1));

This can execute Python code and mutate the dict. If that happens, the index "i"
previously computed by dictiter_iternextitem could become invalid. It would then
index out of bounds with this line:

	key = d->ma_keys->dk_entries[i].me_key;

Furthermore the "value_ptr" variable would have gone stale, too. Taking the
"value" variable out of it uses memory that has been freed:

	value = *value_ptr;

Here's a PoC which crashes with the "value" variable being an arbitrary pointer:

---

class X(int):
    def __del__(self):
        d.clear()
    
d = {i: X(i) for i in range(8)}
    
for result in d.items():
    if result[0] == 2:
        d[2] = None # free d[2] --> X(2).__del__ is called

---

The result:

(gdb) run ./poc29.py 
Program received signal SIGSEGV, Segmentation fault.
dictiter_iternextitem (di=0x7ffff6d49cd8) at Objects/dictobject.c:3187
3187        Py_INCREF(key);
(gdb) print value
$12 = (PyObject *) 0x7b7b7b7b7b7b7b7b
msg275349 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-09-09 16:57
Ping. The built-in dict was considerably changed in #27350; do any of these issues still persist?
msg275897 - (view) Author: Duane Griffin (duaneg) * Date: 2016-09-12 00:16
I cannot reproduce the segfaults for the first four issues however valgrind still reports problems for all but the second. The fifth (last) one still segfaults.

I have a patch for the fifth issue. The other remaining issues are all reporting the same invalid read at exit. I'm not sure whether that is the same issue reported here or something else that they all trigger:

Invalid read of size 8
    at 0x43C5FA: delete_garbage (gcmodule.c:868)
    by 0x43C5FA: collect (gcmodule.c:1019)
    by 0x43D2F0: _PyGC_CollectNoFail (gcmodule.c:1623)
    by 0x55D95E: PyImport_Cleanup (import.c:420)
    by 0x4224E5: Py_FinalizeEx.part.3 (pylifecycle.c:612)
    by 0x43B0CB: Py_Main (main.c:798)
    by 0x41DC71: main (python.c:69)
msg275901 - (view) Author: Duane Griffin (duaneg) * Date: 2016-09-12 00:40
Apologies: compiling python with --with-pydebug all of these issues are reproducible on head after all. Furthermore while my patch fixes the reported crash it still crashes on exit:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000437193 in visit_decref (op=0x7ffff68a4c98, data=0x0) at Modules/gcmodule.c:374
374	    if (PyObject_IS_GC(op)) {
(gdb) bt
#0  0x0000000000437193 in visit_decref (op=0x7ffff68a4c98, data=0x0) at Modules/gcmodule.c:374
#1  0x00000000004a9112 in tupletraverse (o=0x7ffff68a49f8, visit=0x43716d <visit_decref>, arg=0x0) at Objects/tupleobject.c:571
#2  0x000000000043690a in subtract_refs (containers=containers@entry=0x87cda0 <generations+64>) at Modules/gcmodule.c:399
#3  0x0000000000437ac3 in collect (generation=generation@entry=2, n_collected=n_collected@entry=0x7fffffffd838, n_uncollectable=n_uncollectable@entry=0x7fffffffd840, 
    nofail=nofail@entry=0) at Modules/gcmodule.c:956
#4  0x0000000000437f57 in collect_with_callback (generation=generation@entry=2) at Modules/gcmodule.c:1128
#5  0x00000000004383a6 in PyGC_Collect () at Modules/gcmodule.c:1592
#6  0x00000000004383cf in _PyGC_CollectIfEnabled () at Modules/gcmodule.c:1605
#7  0x0000000000420c76 in Py_FinalizeEx () at Python/pylifecycle.c:603
#8  0x000000000043682b in Py_Main (argc=argc@entry=2, argv=argv@entry=0x90d010) at Modules/main.c:798
#9  0x000000000041d153 in main (argc=2, argv=0x7fffffffdae8) at ./Programs/python.c:69
msg275919 - (view) Author: Duane Griffin (duaneg) * Date: 2016-09-12 02:21
Fix for the first issue: with this fix there is no segfault or valgrind issue reported on during execution or on exit.
msg275924 - (view) Author: Duane Griffin (duaneg) * Date: 2016-09-12 02:39
Fix for the second issue: with this fix there is no segfault or valgrind issue reported on during execution or on exit.
msg275951 - (view) Author: Duane Griffin (duaneg) * Date: 2016-09-12 04:29
Ah, my first fix (for the fifth issue) was incomplete. Please see attached patch which I think correctly fixes the problem.
msg275957 - (view) Author: Tim Mitchell (Tim Mitchell) * Date: 2016-09-12 04:46
Here is my patch for parts 3 and 4.
Core issue for part 4 appears to be dk_lookup calling arbitrary python code may free the key.
dk_lookup is also used in _PyDict_LoadGlobal not sure if this bug can occur here.
msg275960 - (view) Author: Duane Griffin (duaneg) * Date: 2016-09-12 04:59
Note that I think most or all of these issues apply to 2.7 and while I didn't do a proper check I think the fixes also apply.
msg280931 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-16 10:13
0001-Issue-27945-fix-PyDict_MergeFromSeq2-use-after-free.patch:

LGTM. I've checked it can be applied to 2.7 and 3.5 branch and passes
`make quicktest`.
msg280932 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-16 10:27
0001-Issue-27945-fix-dictitems_contains-use-after-free.patch

LGTM.
This patch can be applied to 2.7 and 3.5, without conflict against previous patch.
It passes `make quicktest`.
msg280937 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-16 11:46
0001-Issue-27945-fix-dictiter_iternextitem-use-after-free.patch	LGTM and OK too.
But 0001-Issue-27945-Fixed-segfaults-in-dict.fromkeys-when-it.patch	cause conflict.

I want to commit first three patches.
For another reviewer, here is the patch merging three patches. Please review it.
msg280940 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-16 12:46
I worry about performance. Additional increment/decrement of reference counter can add significant overhead in critical places of Python interpreter. Before committing any patches we should measure the performance lost.
msg280943 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-16 13:20
OK, I'll run benchmark in this week.
But three patches seems don't affects to performance critical APIs.
msg281034 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-17 12:17
I run performance 0.5.0 on Python 3.5.
Since it took long time even without -b all option, I haven't run it for Python 2.7 yet.

On Python 3.5:

$ ./venv/cpython3.5-846d5b1f0b61/bin/python -m perf compare_to py35-master.json py35-patched.json  -G
Slower (14):
- logging_silent: 2.92 us +- 0.15 us -> 3.05 us +- 0.18 us: 1.04x slower
- pickle: 93.2 us +- 4.7 us -> 95.8 us +- 5.0 us: 1.03x slower
- python_startup: 72.6 ms +- 4.7 ms -> 74.6 ms +- 4.8 ms: 1.03x slower
- meteor_contest: 766 ms +- 13 ms -> 779 ms +- 15 ms: 1.02x slower
- scimark_sor: 2.00 sec +- 0.04 sec -> 2.04 sec +- 0.03 sec: 1.02x slower
- fannkuch: 3.86 sec +- 0.05 sec -> 3.91 sec +- 0.04 sec: 1.01x slower
- python_startup_no_site: 37.4 ms +- 2.4 ms -> 38.0 ms +- 2.6 ms: 1.01x slower
- regex_dna: 958 ms +- 13 ms -> 973 ms +- 16 ms: 1.01x slower
- regex_compile: 1.41 sec +- 0.02 sec -> 1.43 sec +- 0.02 sec: 1.01x slower
- raytrace: 5.46 sec +- 0.06 sec -> 5.52 sec +- 0.07 sec: 1.01x slower
- scimark_lu: 1.88 sec +- 0.09 sec -> 1.90 sec +- 0.12 sec: 1.01x slower
- nbody: 887 ms +- 19 ms -> 895 ms +- 21 ms: 1.01x slower
- 2to3: 3.01 sec +- 0.03 sec -> 3.03 sec +- 0.04 sec: 1.01x slower
- scimark_fft: 2.52 sec +- 0.05 sec -> 2.54 sec +- 0.04 sec: 1.01x slower

Faster (16):
- scimark_sparse_mat_mult: 35.0 ms +- 1.8 ms -> 32.7 ms +- 2.0 ms: 1.07x faster
- xml_etree_process: 1.13 sec +- 0.03 sec -> 1.07 sec +- 0.03 sec: 1.06x faster
- xml_etree_generate: 1.35 sec +- 0.03 sec -> 1.29 sec +- 0.03 sec: 1.05x faster
- mako: 154 ms +- 8 ms -> 149 ms +- 7 ms: 1.04x faster
- telco: 87.4 ms +- 3.7 ms -> 84.2 ms +- 4.1 ms: 1.04x faster
- tornado_http: 1.75 sec +- 0.03 sec -> 1.70 sec +- 0.03 sec: 1.03x faster
- call_simple: 45.5 ms +- 2.1 ms -> 44.3 ms +- 1.4 ms: 1.03x faster
- sympy_integrate: 202 ms +- 9 ms -> 198 ms +- 7 ms: 1.02x faster
- sympy_str: 2.08 sec +- 0.06 sec -> 2.04 sec +- 0.06 sec: 1.02x faster
- django_template: 1.65 sec +- 0.03 sec -> 1.61 sec +- 0.03 sec: 1.02x faster
- call_method_unknown: 65.1 ms +- 0.9 ms -> 64.3 ms +- 0.9 ms: 1.01x faster
- xml_etree_parse: 1.34 sec +- 0.04 sec -> 1.32 sec +- 0.04 sec: 1.01x faster
- sympy_sum: 1.01 sec +- 0.03 sec -> 998 ms +- 35 ms: 1.01x faster
- call_method_slots: 59.3 ms +- 0.8 ms -> 58.6 ms +- 1.0 ms: 1.01x faster
- float: 1.24 sec +- 0.03 sec -> 1.23 sec +- 0.02 sec: 1.01x faster
- chaos: 1.20 sec +- 0.02 sec -> 1.19 sec +- 0.02 sec: 1.01x faster

Benchmark hidden because not significant (31): call_method, chameleon, crypto_pyaes, deltablue, dulwich_log, genshi_text, genshi_xml, go, hexiom, html5lib, json_dumps, json_loads, logging_format, logging_simple, nqueens, pathlib, pickle_dict, pickle_list, pickle_pure_python, pidigits, regex_effbot, regex_v8, richards, scimark_monte_carlo, spectral_norm, sympy_expand, unpack_sequence, unpickle, unpickle_list, unpickle_pure_python, xml_etree_iterparse
msg281035 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-17 12:25
Only patch which affects to hot loop is:

--- a/Objects/dictobject.c	Tue Nov 15 21:21:35 2016 -0500
+++ b/Objects/dictobject.c	Wed Nov 16 11:40:51 2016 +0000
@@ -1550,11 +1550,18 @@ PyDict_MergeFromSeq2(PyObject *d, PyObje
         /* Update/merge with this (key, value) pair. */
         key = PySequence_Fast_GET_ITEM(fast, 0);
         value = PySequence_Fast_GET_ITEM(fast, 1);
+        Py_INCREF(key);
+        Py_INCREF(value);
         if (override || PyDict_GetItem(d, key) == NULL) {
             int status = PyDict_SetItem(d, key, value);
-            if (status < 0)
+            if (status < 0) {
+                Py_DECREF(key);
+                Py_DECREF(value);
                 goto Fail;
+            }
         }
+        Py_DECREF(key);
+        Py_DECREF(value);
         Py_DECREF(fast);
         Py_DECREF(item);
     }

Microbenchmark for it:

$ ~/local/py35/bin/master -m perf timeit --rigorous -t --python ~/local/py35/bin/patched --compare-to ~/local/py35/bin/master -s 'L = [(i,i) for i in range(10000)]' -- 'dict.fromkeys(L)'
Benchmark master
================

.........................................Total duration: 21.2 sec
Start date: 2016-11-17 12:21:39
End date: 2016-11-17 12:22:13
Raw sample minimum: 109 ms
Raw sample maximum: 144 ms

Number of runs: 41
Total number of samples: 120
Number of samples per run: 3
Number of warmups per run: 1
Loop iterations per sample: 32

Minimum: 3.41 ms (-13%)
Median +- std dev: 3.92 ms +- 0.19 ms
Mean +- std dev: 3.92 ms +- 0.19 ms
Maximum: 4.50 ms (+15%)

Median +- std dev: 3.92 ms +- 0.19 ms

Benchmark patched
=================

.........................................Total duration: 21.3 sec
Start date: 2016-11-17 12:22:13
End date: 2016-11-17 12:22:47
Raw sample minimum: 108 ms
Raw sample maximum: 152 ms

Number of runs: 41
Total number of samples: 120
Number of samples per run: 3
Number of warmups per run: 1
Loop iterations per sample: 32

Minimum: 3.39 ms (-14%)
Median +- std dev: 3.92 ms +- 0.23 ms
Mean +- std dev: 3.95 ms +- 0.23 ms
Maximum: 4.74 ms (+21%)

Median +- std dev: 3.92 ms +- 0.23 ms

Compare
=======

Median +- std dev: [master] 3.92 ms +- 0.19 ms -> [patched] 3.92 ms +- 0.23 ms: 1.00x slower
Not significant!
msg281037 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-17 12:34
I'm sorry, dict.fromkeys() didn't use PyDict_MergeFromSeq2().

This may be microbench for worst case:

$ ~/local/py35/bin/master -m perf timeit --rigorous  --python ~/local/py35/bin/patched --compare-to ~/local/py35/bin/master -s 'L = [(i,i) for i in range(10000)]' -- 'dict(L)'
master: ......................................... 2.06 ms +- 0.11 ms
patched: ......................................... 2.16 ms +- 0.09 ms

Median +- std dev: [master] 2.06 ms +- 0.11 ms -> [patched] 2.16 ms +- 0.09 ms: 1.05x slower

$ ~/local/py27/bin/master -m perf timeit --rigorous  --python ~/local/py27/bin/patched --compare-to ~/local/py27/bin/master -s 'L = [(i,i) for i in range(10000)]' -- 'dict(L)'
master: ......................................... 1.48 ms +- 0.06 ms
patched: ......................................... 1.57 ms +- 0.09 ms

Median +- std dev: [master] 1.48 ms +- 0.06 ms -> [patched] 1.57 ms +- 0.09 ms: 1.06x slower
msg281039 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-17 13:13
I modified the patch to avoid incref&decref when pair is not list, because tuple is common for such case.
But I can't get back original performance.

(python 2.7 is modified version of patch)

inada-n@test1:~/work/bench$ ~/local/py27/bin/master -m perf timeit --rigorous  --python ~/local/py27/bin/python2.7 --compare-to ~/local/py27/bin/master -s 'L = [(i,i) for i in range(10000)]' -- 'dict(L)'
master: ......................................... 1.47 ms +- 0.06 ms
python2.7: ......................................... 1.55 ms +- 0.06 ms

Median +- std dev: [master] 1.47 ms +- 0.06 ms -> [python2.7] 1.55 ms +- 0.06 ms: 1.05x slower
inada-n@test1:~/work/bench$ ~/local/py27/bin/master -m perf timeit --rigorous  --python ~/local/py27/bin/python2.7 --compare-to ~/local/py27/bin/patched -s 'L = [(i,i) for i in range(10000)]' -- 'dict(L)'
patched: ......................................... 1.56 ms +- 0.08 ms
python2.7: ......................................... 1.55 ms +- 0.09 ms

Median +- std dev: [patched] 1.56 ms +- 0.08 ms -> [python2.7] 1.55 ms +- 0.09 ms: 1.01x faster
Not significant!
msg281228 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-19 19:33
Here is a consolidated patch for review.
msg281229 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-19 21:33
The change to dict_equal() LGTM. It doesn't add an overhead.

For dictiter_iternextitem() I propose other change. It doesn't add an overhead.

There are bugs in the patch for _PyDict_FromKeys().

The change to dictitems_contains() adds an overhead, but it is small and seems unavoidable.

I wondering whether it is worth to use PySequence_Tuple() instead of PySequence_Fast() in PyDict_MergeFromSeq2(). This would add a cost of creating new tuple if items are not tuples, but save increfing/decrefing the key and the value in common case.

I have doubts about insertdict(). Additional incref duplicates increfs in dict_merge(). Is it worth to move it outside of insertdict()?

I have doubts about _PyDict_FromKeys(). It seems to me that it is safe to use _PyDict_Next() in a loop that mutates the dict (not checked _PySet_Next()). With guarded insertdict() additional check is not needed (and it doesn't help against clearing the dict inside insertdict()).

In updated patch fixed some bugs and used different solution for dictiter_iternextitem().
msg281328 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-11-21 07:45
LGTM.

Performance on Azure VM (AMD Opteron(tm) Processor 4171 HE):

$ ~/local/py36/bin/patched -m perf compare_to master.json patched.json -G
Slower (10):
- spectral_norm: 915 ms +- 17 ms -> 967 ms +- 25 ms: 1.06x slower
- nbody: 774 ms +- 28 ms -> 805 ms +- 22 ms: 1.04x slower
- regex_dna: 965 ms +- 18 ms -> 993 ms +- 22 ms: 1.03x slower
- go: 1.93 sec +- 0.05 sec -> 1.99 sec +- 0.06 sec: 1.03x slower
- python_startup_no_site: 29.7 ms +- 2.0 ms -> 30.5 ms +- 2.0 ms: 1.03x slower
- xml_etree_parse: 1.02 sec +- 0.03 sec -> 1.04 sec +- 0.04 sec: 1.02x slower
- python_startup: 52.7 ms +- 3.6 ms -> 53.7 ms +- 3.6 ms: 1.02x slower
- xml_etree_generate: 962 ms +- 24 ms -> 978 ms +- 25 ms: 1.02x slower
- pickle_dict: 188 us +- 8 us -> 191 us +- 8 us: 1.02x slower
- scimark_fft: 2.19 sec +- 0.04 sec -> 2.20 sec +- 0.05 sec: 1.00x slower

Faster (26):
- fannkuch: 3.76 sec +- 0.07 sec -> 3.55 sec +- 0.09 sec: 1.06x faster
- call_method_slots: 59.4 ms +- 9.0 ms -> 56.6 ms +- 3.4 ms: 1.05x faster
- sqlite_synth: 27.6 us +- 1.4 us -> 26.4 us +- 1.2 us: 1.05x faster
- nqueens: 852 ms +- 19 ms -> 813 ms +- 20 ms: 1.05x faster
- django_template: 1.35 sec +- 0.04 sec -> 1.29 sec +- 0.03 sec: 1.05x faster
- unpack_sequence: 407 ns +- 16 ns -> 390 ns +- 16 ns: 1.04x faster
- call_method: 60.6 ms +- 1.4 ms -> 58.1 ms +- 1.6 ms: 1.04x faster
- xml_etree_iterparse: 918 ms +- 29 ms -> 881 ms +- 32 ms: 1.04x faster
- unpickle_pure_python: 3.44 ms +- 0.14 ms -> 3.31 ms +- 0.19 ms: 1.04x faster
- richards: 608 ms +- 20 ms -> 585 ms +- 15 ms: 1.04x faster
- chaos: 921 ms +- 22 ms -> 891 ms +- 24 ms: 1.03x faster
- mako: 189 ms +- 8 ms -> 183 ms +- 8 ms: 1.03x faster
- meteor_contest: 699 ms +- 23 ms -> 677 ms +- 26 ms: 1.03x faster
- regex_compile: 1.52 sec +- 0.03 sec -> 1.48 sec +- 0.04 sec: 1.03x faster
- chameleon: 101 ms +- 5 ms -> 97.8 ms +- 4.0 ms: 1.03x faster
- regex_v8: 153 ms +- 7 ms -> 149 ms +- 6 ms: 1.03x faster
- 2to3: 2.59 sec +- 0.05 sec -> 2.52 sec +- 0.05 sec: 1.03x faster
- crypto_pyaes: 797 ms +- 20 ms -> 776 ms +- 18 ms: 1.03x faster
- genshi_xml: 653 ms +- 23 ms -> 637 ms +- 27 ms: 1.03x faster
- pickle_pure_python: 4.54 ms +- 0.19 ms -> 4.42 ms +- 0.20 ms: 1.03x faster
- regex_effbot: 18.4 ms +- 0.7 ms -> 18.1 ms +- 0.6 ms: 1.02x faster
- unpickle: 107 us +- 4 us -> 105 us +- 5 us: 1.02x faster
- deltablue: 61.2 ms +- 3.4 ms -> 60.0 ms +- 3.5 ms: 1.02x faster
- raytrace: 4.34 sec +- 0.14 sec -> 4.26 sec +- 0.10 sec: 1.02x faster
- telco: 63.9 ms +- 3.1 ms -> 62.9 ms +- 3.2 ms: 1.02x faster
- pidigits: 965 ms +- 14 ms -> 955 ms +- 20 ms: 1.01x faster

Benchmark hidden because not significant (28): call_method_unknown, call_simple, dulwich_log, float, genshi_text, hexiom, html5lib, json_dumps, json_loads, logging_format, logging_silent, logging_simple, pathlib, pickle, pickle_list, scimark_lu, scimark_monte_carlo, scimark_sor, scimark_sparse_mat_mult, sqlalchemy_declarative, sqlalchemy_imperative, sympy_expand, sympy_integrate, sympy_str, sympy_sum, tornado_http, unpickle_list, xml_etree_process
msg281944 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2016-11-29 04:09
Where do we stand on this issue?  At the moment, 3.6.0 is going to be released without these fixes.
msg281946 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-29 04:58
This issue is severe, but I don't consider it as release critical for 3.6.0. The patch fixes segfaults, but it can add unneeded overhead, and the dict performance is critical for Python core. The segfaults are not new. I'm trying to minimize the overhead of changes, but this doesn't suffer a haste.
msg294022 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-20 09:30
New changeset 753bca3934a7618a4fa96e107ad1c5c18633a683 by Serhiy Storchaka in branch 'master':
bpo-27945: Fixed various segfaults with dict. (#1657)
https://github.com/python/cpython/commit/753bca3934a7618a4fa96e107ad1c5c18633a683
msg294024 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-20 10:06
New changeset 564398af6ccb34d0db8b6e2537830eca285689e5 by Serhiy Storchaka in branch '3.6':
[3.6] bpo-27945: Fixed various segfaults with dict. (GH-1657) (#1677)
https://github.com/python/cpython/commit/564398af6ccb34d0db8b6e2537830eca285689e5
msg294025 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-20 10:06
New changeset 2f7f533cf6fb57fcedcbc7bd454ac59fbaf2c655 by Serhiy Storchaka in branch '3.5':
[3.5] bpo-27945: Fixed various segfaults with dict. (GH-1657) (#1678)
https://github.com/python/cpython/commit/2f7f533cf6fb57fcedcbc7bd454ac59fbaf2c655
msg294035 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-20 17:05
New changeset e6a0b5982973e64b9fa28e5e3e54eb8c47882780 by Serhiy Storchaka in branch '2.7':
[2.7] bpo-27945: Fixed various segfaults with dict. (GH-1657) (#1681)
https://github.com/python/cpython/commit/e6a0b5982973e64b9fa28e5e3e54eb8c47882780
msg294037 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-20 17:22
Thank you for your patches Duane and Tim. Thank you for your very detailed report that allow writing these patches tehybel.

I excluded changes in dict.fromkeys() since they look unnecessary for this issue after fixing insertdict(). There are other reasons for adding these changes (this makes the behavior with exact dicts and sets and their subclasses more consistent), but this is other issue. There are also reasons for not adding them.

We need to check also the set implementation whether it is vulnerable to similar issues.
msg297456 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2017-06-30 23:55
Since Serhiy created backport PRs for 3.4 and 3.3, I'm reopening the issue and marking it as Release Blocker (for those releases) so we don't lose track of them and agree they meet the criteria for security-fix-only releases.
msg297474 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-01 05:53
For the context see issue30484. If the segfault is caused by garbage collection this perhaps is not an expluatable vulnerability, but a severe bug that can affect any multithread application and doesn't have a workaround.
msg298159 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2017-07-11 13:30
New changeset f7344798e57da6b9c4ed9372e8eaecde80989c86 by larryhastings (Serhiy Storchaka) in branch '3.4':
[3.4] [3.5] bpo-27945: Fixed various segfaults with dict. (GH-1657) (GH-1678) (#2248)
https://github.com/python/cpython/commit/f7344798e57da6b9c4ed9372e8eaecde80989c86
msg299197 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2017-07-26 03:07
New changeset 8fbdab50fc8f2b71f19b54f3a0208cfbf2be7713 by Ned Deily (Serhiy Storchaka) in branch '3.3':
[3.3] [3.5] bpo-27945: Fixed various segfaults with dict. (GH-1657) (GH-1678) (#2396)
https://github.com/python/cpython/commit/8fbdab50fc8f2b71f19b54f3a0208cfbf2be7713
History
Date User Action Args
2017-07-26 03:08:42ned.deilysetpriority: release blocker ->
status: open -> closed
resolution: fixed
stage: commit review -> resolved
2017-07-26 03:07:32ned.deilysetmessages: + msg299197
2017-07-11 13:30:23larrysetmessages: + msg298159
2017-07-01 05:53:02serhiy.storchakasetmessages: + msg297474
2017-06-30 23:55:51ned.deilysetstatus: closed -> open
priority: critical -> release blocker

versions: + Python 3.3, Python 3.4
nosy: + benjamin.peterson, georg.brandl

messages: + msg297456
resolution: fixed -> (no value)
stage: resolved -> commit review
2017-06-25 17:41:43serhiy.storchakasetpull_requests: + pull_request2442
2017-06-16 15:00:11serhiy.storchakasetpull_requests: + pull_request2296
2017-05-20 17:22:27serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg294037

stage: patch review -> resolved
2017-05-20 17:05:29serhiy.storchakasetmessages: + msg294035
2017-05-20 15:19:47serhiy.storchakasetpull_requests: + pull_request1776
2017-05-20 10:06:49serhiy.storchakasetmessages: + msg294025
2017-05-20 10:06:28serhiy.storchakasetmessages: + msg294024
2017-05-20 09:43:15serhiy.storchakasetpull_requests: + pull_request1773
2017-05-20 09:38:20serhiy.storchakasetpull_requests: + pull_request1772
2017-05-20 09:30:04serhiy.storchakasetmessages: + msg294022
2017-05-18 20:08:15serhiy.storchakasetpull_requests: + pull_request1751
2016-11-29 04:58:23serhiy.storchakasetmessages: + msg281946
2016-11-29 04:09:33ned.deilysetmessages: + msg281944
2016-11-21 07:45:38inada.naokisetmessages: + msg281328
2016-11-19 21:33:29serhiy.storchakasetfiles: + 27945-dict-segv-py37-2.patch

messages: + msg281229
2016-11-19 19:33:10serhiy.storchakasetfiles: + 27945-dict-segv-py36.patch

messages: + msg281228
2016-11-17 13:13:33inada.naokisetmessages: + msg281039
2016-11-17 12:34:56inada.naokisetmessages: + msg281037
2016-11-17 12:25:47inada.naokisetmessages: + msg281035
2016-11-17 12:17:54inada.naokisetmessages: + msg281034
2016-11-16 13:20:00inada.naokisetmessages: + msg280943
2016-11-16 12:46:29serhiy.storchakasetmessages: + msg280940
2016-11-16 11:46:18inada.naokisetfiles: + 27945-dict-segv-py27.patch

messages: + msg280937
2016-11-16 10:27:31inada.naokisetmessages: + msg280932
2016-11-16 10:13:31inada.naokisetmessages: + msg280931
2016-10-09 12:43:43serhiy.storchakasetnosy: + inada.naoki

versions: + Python 2.7, Python 3.7
2016-09-12 04:59:56duanegsetmessages: + msg275960
2016-09-12 04:46:43Tim Mitchellsetfiles: + 0001-Issue-27945-Fixed-segfaults-in-dict.fromkeys-when-it.patch
nosy: + Tim Mitchell
messages: + msg275957

2016-09-12 04:38:06serhiy.storchakasetassignee: serhiy.storchaka
stage: needs patch -> patch review
2016-09-12 04:30:24duanegsetfiles: - 0001-Issue-27945-fix-dictiter_iternextitem-use-after-free.patch
2016-09-12 04:29:53duanegsetfiles: + 0001-Issue-27945-fix-dictiter_iternextitem-use-after-free.patch

messages: + msg275951
2016-09-12 02:39:08duanegsetfiles: + 0001-Issue-27945-fix-dictitems_contains-use-after-free.patch

messages: + msg275924
2016-09-12 02:21:35duanegsetfiles: + 0001-Issue-27945-fix-PyDict_MergeFromSeq2-use-after-free.patch

messages: + msg275919
2016-09-12 00:40:56duanegsetmessages: + msg275901
2016-09-12 00:16:37duanegsetfiles: + 0001-Issue-27945-fix-dictiter_iternextitem-use-after-free.patch

nosy: + duaneg
messages: + msg275897

keywords: + patch
2016-09-09 16:57:45ebarrysetmessages: + msg275349
2016-09-02 22:26:01christian.heimessetnosy: + christian.heimes
2016-09-02 21:45:42ebarrysetnosy: + rhettinger, larry, ned.deily, serhiy.storchaka, ebarry
title: five dictobject issues -> Various segfaults with dict
priority: normal -> critical
type: crash
stage: needs patch
2016-09-02 21:34:33tehybelsetversions: + Python 3.5, Python 3.6
2016-09-02 21:33:28tehybelcreate