Issue 27826: Null-pointer dereference in tuplehash() function

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/72013

classification

Title:	Null-pointer dereference in tuplehash() function
Type:	crash	Stage:
Components:		Versions:	Python 3.6, Python 3.5

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	artem.smotrakov, rhettinger, serhiy.storchaka, vstinner
Priority:	normal	Keywords:	patch

Created on 2016-08-22 00:12 by artem.smotrakov, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
marshal_tuplehash_null_dereference.py	artem.smotrakov, 2016-08-22 00:12	A test which reproduces the issue
tuplehash.patch	artem.smotrakov, 2016-08-22 00:26	Suggested patch for tuplehash() function	review

Messages (6)
msg273322 - (view)	Author: Artem Smotrakov (artem.smotrakov) *	Date: 2016-08-22 00:12
A null-pointer dereference may happen while deserialization incorrect data with marshal.loads() function. Here is a test which reproduces this (see also attached marshal_tuplehash_null_dereference.py): import marshal value = ( # tuple1 "this is a string", #string1 [ 1, # int1 2, # int2 3, # int3 4 # int4 ], ( #tuple2 "more tuples", #string2 1.0, # float1 2.3, # float2 4.5 # float3 ), "this is yet another string" ) dump = marshal.dumps(value) data = bytearray(dump) data[10] = 40 data[4] = 16 data[103] = 143 data[97] = 245 data[78] = 114 data[35] = 188 marshal.loads(bytes(data)) This code modifies the serialized data with the following: - update type of 'int2' element to TYPE_SET, 'int3' element becomes a length of the set - update 'float3' element to TYPE_REF which points to tuple1 Here is a stack trace reported by ASan: ASAN:SIGSEGV ================================================================= ==20296==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008 (pc 0x000000582064 bp 0x7ffc9e581310 sp 0x7ffc9e5812f0 T0) #0 0x582063 in PyObject_Hash Objects/object.c:769 #1 0x5a3662 in tuplehash Objects/tupleobject.c:358 #2 0x5820ae in PyObject_Hash Objects/object.c:771 #3 0x5a3662 in tuplehash Objects/tupleobject.c:358 #4 0x5820ae in PyObject_Hash Objects/object.c:771 #5 0x58fac8 in set_add_key Objects/setobject.c:422 #6 0x59a85c in PySet_Add Objects/setobject.c:2323 #7 0x760d9d in r_object Python/marshal.c:1310 #8 0x76029d in r_object Python/marshal.c:1223 #9 0x760015 in r_object Python/marshal.c:1195 #10 0x7621dc in read_object Python/marshal.c:1465 #11 0x7639be in marshal_loads Python/marshal.c:1767 #12 0x577ff3 in PyCFunction_Call Objects/methodobject.c:109 #13 0x708a05 in call_function Python/ceval.c:4744 #14 0x6fb5a7 in PyEval_EvalFrameEx Python/ceval.c:3256 #15 0x70276f in _PyEval_EvalCodeWithName Python/ceval.c:4050 #16 0x70299f in PyEval_EvalCodeEx Python/ceval.c:4071 #17 0x6e07d7 in PyEval_EvalCode Python/ceval.c:778 #18 0x432354 in run_mod Python/pythonrun.c:980 #19 0x431e5b in PyRun_FileExFlags Python/pythonrun.c:933 #20 0x42e929 in PyRun_SimpleFileExFlags Python/pythonrun.c:396 #21 0x42caba in PyRun_AnyFileExFlags Python/pythonrun.c:80 #22 0x45f995 in run_file Modules/main.c:319 #23 0x4619c8 in Py_Main Modules/main.c:777 #24 0x41d258 in main Programs/python.c:69 #25 0x7f374629babf in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x20abf) #26 0x41ce28 in _start AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV Objects/object.c:769 PyObject_Hash ==20296==ABORTING What happens when it tries to read int2 element: - int2 element is now a set of length 3 - add int4 element to the set - add tuple2 element -- when it adds an element to a set, it calculates a hash of the element -- when it calculates a hash of a tuple, it calculates hashes of all elements of the tuple -- while calculating a hash of tuple2, it calculates a hash of tuple1 since #float3 now is a TYPE_REF which points to tuple1 -- but tuple1 is not complete yet: length of tuple1 is 4, but only string1 was added to it -- tuplehash() function reads a length of a tuple, and then calls PyObject_Hash() for each element -- but it doesn't check if all elements were added to the tuple -- as a result, a null-pointer dereference happens in tuplehash() while reading second element of tuple1 https://hg.python.org/cpython/file/tip/Objects/tupleobject.c#l347 ... static Py_hash_t tuplehash(PyTupleObject v) { Py_uhash_t x; / Unsigned for defined overflow behavior. / Py_hash_t y; Py_ssize_t len = Py_SIZE(v); <= for tuple1 it returns 4, but tuple1 contains only one element (string1) PyObject p; Py_uhash_t mult = _PyHASH_MULTIPLIER; x = 0x345678UL; p = v->ob_item; while (--len >= 0) { y = PyObject_Hash(p++); <= null-pointer dereference happens here while reading second element ... I could reproduce it with python3.5, and latest build of https://hg.python.org/cpython (Aug 20th, 2016). Here is a simple patch which updates tuplehash() to check "p" for null: diff -r 6e6aa2054824 Objects/tupleobject.c --- a/Objects/tupleobject.c Sat Aug 20 21:22:03 2016 +0300 +++ b/Objects/tupleobject.c Sat Aug 20 23:17:16 2016 -0700 @@ -355,7 +355,13 @@ x = 0x345678UL; p = v->ob_item; while (--len >= 0) { - y = PyObject_Hash(p++); + PyObject next = p++; + if (next == NULL) { + PyErr_SetString(PyExc_TypeError, + "Cannot compute a hash, tuple seems to be invalid"); + return -1; + } + y = PyObject_Hash(next); if (y == -1) return -1; x = (x ^ y) mult; According to https://docs.python.org/3.4/library/marshal.html, the marshal module is not intended to be secure against erroneous or maliciously constructed data, but it might be better to avoid a crash here.
msg273325 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2016-08-22 01:56
> but it might be better to avoid a crash here. I'm reluctant to introduce changes like this, especially in the middle of a loop. This code and code like it has been nonproblematic for Python's 26 year history. The code throughout tupleobject.c assumes well-formed tuples (for example, tuplecontains and tupleitem do not have NULL checks before dereferencing). At some point, the C code just has to trust its own structure invariants and people who run marshal or pickle have to trust their inputs.
msg273332 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2016-08-22 03:45
It may be better to focus on Python/marshal.c to see if there are ways to make it more robust (at least checking to see if all of the n entries allocated in a container were actually filled).
msg273335 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-08-22 05:50
The simplest example: import marshal t = [], t[0].append(t) b = marshal.dumps(t) b = bytearray(b) b[2] = b'<'[0] marshal.loads(b) Create a recursive tuple containing a list containing a reference to original tuple. Marshal it and replace TYPE_LIST ('[') by TYPE_SET ('<'). Now marshalled data contains a recursive tuple containing a set containing a reference to original tuple. When a tuple is added to a set, it still is not initialized, and hash is calculated on a uninitialized tuple. I believe it is not possible to create such structure without hacking marhal data or using C API. And it is hard to protect from such situation in marshal.c.
msg273338 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-08-22 07:50
> And it is hard to protect from such situation in marshal.c. Python doesn't validate marshal nor bytecode. It's a deliberate choice to get best performances.
msg273399 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2016-08-22 19:52
Closing as won't fix. It is impractical to make marshal resilient against bytecode hacks and it is likewise impractical to put a NULL pointer check in-front of every dereference in the language.

History
Date	User	Action	Args
2022-04-11 14:58:35	admin	set	github: 72013
2016-08-22 19:52:14	rhettinger	set	status: open -> closed resolution: wont fix messages: + msg273399
2016-08-22 07:50:15	vstinner	set	nosy: + vstinner messages: + msg273338
2016-08-22 05:50:57	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg273335
2016-08-22 03:45:25	rhettinger	set	messages: + msg273332
2016-08-22 01:56:44	rhettinger	set	messages: + msg273325
2016-08-22 01:44:05	rhettinger	set	assignee: rhettinger nosy: + rhettinger
2016-08-22 00:26:51	artem.smotrakov	set	files: + tuplehash.patch keywords: + patch
2016-08-22 00:12:06	artem.smotrakov	create