This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: _sre: avoid relying on pointer overflow
Type: security Stage: resolved
Components: Extension Modules, Regular Expressions Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Nickolai.Zeldovich, ezio.melotti, mark.dickinson, mrabarnett, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2013-01-22 15:54 by Nickolai.Zeldovich, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
pp.patch Nickolai.Zeldovich, 2013-01-22 15:54 Proposed patch for _sre.c review
pp-2.patch Nickolai.Zeldovich, 2013-03-10 18:54 review
pp-3.patch Nickolai.Zeldovich, 2013-03-11 14:38 review
Messages (14)
msg180403 - (view) Author: Nickolai Zeldovich (Nickolai.Zeldovich) * Date: 2013-01-22 15:54
Modules/_sre.c relies on pointer overflow in 5 places to check that the supplied offset does not cause wraparound when added to a base pointer; e.g.:

                    SRE_CODE prefix_len;
                    GET_ARG; prefix_len = arg;
                    GET_ARG;
                    /* Here comes the prefix string */
                    if (code+prefix_len < code || code+prefix_len > newcode)
                        FAIL;

however, pointer wraparound is undefined behavior in C, and gcc will optimize away (code+prefix_len < code) to (true), since prefix_len is an unsigned value.  This will happen with -O2 and even with -fwrapv:

nickolai@sahara:/tmp$ cat x.c
void bar();

void
foo(int *p, unsigned int x)
{
  if (p + x < p)
    bar();
}
nickolai@sahara:/tmp$ gcc x.c -S -o - -O2 -fwrapv
...
foo:
.LFB0:
	.cfi_startproc
	rep
	ret
	.cfi_endproc
...
nickolai@sahara:/tmp$ 

On a 32-bit platform with the development version of cpython, prefix_len seems to end up being an 'unsigned int', so I suspect that supplying a large prefix_len value (perhaps 0xffffffff) could lead to the subsequent loop writing garbage all over memory, or worse (but I have not tried to construct a concrete input that triggers this bug, so maybe there are some checks that make it difficult to trigger the bug).

In any case, this might be worth fixing -- the attached patch provides one proposed fix.  Another option might be to add -fno-strict-overflow to the gcc flags, which may be a reasonable additional measure to take, to avoid such problems biting Python in the future, but I would suggest doing this in addition to fixing the code (since not all compilers support such a flag to disable certain optimizations).
msg180411 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-22 16:50
LGTM.

There are other doubtful places, at lines: 658, 678, 1000, 1084, 2777, 3111.
msg180443 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013-01-23 03:03
Lines 1000 and 1084 will be a problem only if you're near the top of the address space. This is because:

1. ctx->pattern[1] will always be <= ctx->pattern[2].

2. A value of 65535 in ctx->pattern[2] means unlimited, even though SRE_CODE is now UCS4.

See also issue #13169.

If the 'unlimited' value is raised then fixing those lines will become more urgent.
msg180473 - (view) Author: Nickolai Zeldovich (Nickolai.Zeldovich) * Date: 2013-01-23 16:56
Lines 2777 and 3111 do indeed look suspect, because gcc can compile (ptr + offset < ptr) into (offset < 0):

nickolai@sahara:/tmp$ cat x.c 
void bar();

void
foo(char* ptr, int offset)
{
  if (ptr + offset < ptr)
    bar();
}
nickolai@sahara:/tmp$ gcc x.c -S -o - -O2
...
foo:
.LFB0:
	.cfi_startproc
	testl	%esi, %esi
	js	.L4
	rep
	ret
	.p2align 4,,10
	.p2align 3
.L4:
	xorl	%eax, %eax
	jmp	bar
	.cfi_endproc
...
nickolai@sahara:/tmp$ 

Lines 658, 678, 1000, 1084 are potentially problematic -- I don't know of current compilers that will do something unexpected, but it might be worth rewriting the code to avoid undefined behavior anyway.
msg180484 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013-01-23 18:13
You're checking "int offset", but what happens with "unsigned int offset"?
msg180485 - (view) Author: Nickolai Zeldovich (Nickolai.Zeldovich) * Date: 2013-01-23 18:14
For an unsigned int offset, see my original bug report: gcc eliminates the check altogether, since offset >= 0 by definition.
msg182231 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-16 15:49
Nickolai, are you want to update your patch with fixes for other possible pointer overflows? Note, that the maximal repetition number has been increased now.
msg183891 - (view) Author: Nickolai Zeldovich (Nickolai.Zeldovich) * Date: 2013-03-10 18:54
Sorry for the delay.  Attached is an updated patch that should fix all of the issues mentioned in this bug report.
msg183959 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-03-11 13:22
Nickolai, can you please submit a contributor form?

http://python.org/psf/contrib/contrib-form/
http://python.org/psf/contrib/
msg183960 - (view) Author: Nickolai Zeldovich (Nickolai.Zeldovich) * Date: 2013-03-11 13:31
I just submitted the contributor form -- thanks for the reminder.
msg183965 - (view) Author: Nickolai Zeldovich (Nickolai.Zeldovich) * Date: 2013-03-11 14:38
I get an HTTP error when trying to upload another patch through Rietveld, so here's a revised patch that avoids the need for Py_uintptr_t (thanks Serhiy).
msg184184 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-03-14 19:51
Of course it would be nice to have the tests for so much cases as possible, but I am afraid that it will not be easy. The patch LGTM.
msg186778 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-04-13 18:18
New changeset 27162465316f by Serhiy Storchaka in branch '2.7':
Issue #17016: Get rid of possible pointer wraparounds and integer overflows
http://hg.python.org/cpython/rev/27162465316f

New changeset 2673d207c524 by Serhiy Storchaka in branch '3.3':
Issue #17016: Get rid of possible pointer wraparounds and integer overflows
http://hg.python.org/cpython/rev/2673d207c524

New changeset f280786d0e64 by Serhiy Storchaka in branch 'default':
Issue #17016: Get rid of possible pointer wraparounds and integer overflows
http://hg.python.org/cpython/rev/f280786d0e64
msg186840 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-13 20:49
Thank you, Nickolai, for the patch.
History
Date User Action Args
2022-04-11 14:57:40adminsetgithub: 61218
2013-10-31 15:03:20vstinnersetnosy: + vstinner
2013-04-13 20:49:35serhiy.storchakasetmessages: + msg186840
2013-04-13 20:48:30serhiy.storchakasetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
components: + Extension Modules
versions: - Python 3.2
2013-04-13 18:18:52python-devsetnosy: + python-dev
messages: + msg186778
2013-03-14 19:51:09serhiy.storchakasetmessages: + msg184184
2013-03-11 14:38:42Nickolai.Zeldovichsetfiles: + pp-3.patch

messages: + msg183965
2013-03-11 13:31:44Nickolai.Zeldovichsetmessages: + msg183960
2013-03-11 13:22:56serhiy.storchakasetmessages: + msg183959
2013-03-10 18:54:30Nickolai.Zeldovichsetfiles: + pp-2.patch

messages: + msg183891
2013-02-16 15:49:44serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg182231
2013-01-23 18:14:41Nickolai.Zeldovichsetmessages: + msg180485
2013-01-23 18:13:47mrabarnettsetmessages: + msg180484
2013-01-23 16:56:35Nickolai.Zeldovichsetmessages: + msg180473
2013-01-23 03:03:08mrabarnettsetmessages: + msg180443
2013-01-22 16:50:43serhiy.storchakasetmessages: + msg180411
2013-01-22 15:56:51ezio.melottisetnosy: + ezio.melotti, mark.dickinson, serhiy.storchaka, mrabarnett
stage: patch review

components: + Regular Expressions, - None
versions: + Python 2.7, Python 3.2, Python 3.3, Python 3.4, - Python 3.5
2013-01-22 15:54:28Nickolai.Zeldovichcreate