classification
Title: PyUnicode_FromFormatV crasher
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: Arfrever, Guido, georg.brandl, gvanrossum, python-dev, serhiy.storchaka, skrah, vstinner
Priority: normal Keywords: needs review, patch

Created on 2014-12-15 17:18 by gvanrossum, last changed 2015-02-01 11:13 by Arfrever. This issue is now closed.

Files
File name Uploaded Description Edit
vranken.diff gvanrossum, 2014-12-15 17:18 review
issue23055-2.7.patch serhiy.storchaka, 2014-12-15 23:38 review
issue23055-2.7-2.patch serhiy.storchaka, 2014-12-16 12:31 review
issue23055-3.2.patch serhiy.storchaka, 2014-12-16 12:31 review
issue23055-3.3.patch serhiy.storchaka, 2014-12-16 12:31 review
issue23055_2.patch serhiy.storchaka, 2015-01-30 22:20 review
Messages (18)
msg232673 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-12-15 17:18
Fix as reported by Guido Vranken on security@python.org, with minimal test by me.

Needs:

- review
- port to 3.2--3.5

Questions:

- Does my test case cover all changed code?
msg232689 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-15 23:38
I have added a couple of comments. Here is a patch which fixes found bugs.

3.4+ is not affected by this bug. 3.2 looks same as 2.7 and is affected, 3.3 uses different code but at first glance looks affected too. Is it worth to fix this bug in 3.2 and 3.3?
msg232690 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-12-15 23:46
Thank you for digging into this! I'd say go ahead and update 3.2 and 3.3 too -- these are in security-fix-only mode meaning that we only fix security issues and don't do actual releases. But we still do security bugfixes: for 3.2 until February 2016 (PEP 392), for 3.3 until September 2017 (PEP 398).

I don't think that this warrants changing the 2.7 bugfix release schedule.
msg232691 - (view) Author: Guido Vranken (Guido) Date: 2014-12-16 00:25
Serhiy Storchaka: good call on changing my 'n += (width + precision) < 20 ? 20 : (width + precision);' into 'if (width < precision) width = precision;', I didn't realize that sprintf's space requirement entails using the largest of the two instead of adding the two together.

I noticed the apparently pointless width calculation in 'step 1' but decided not to touch it -- good that it's removed now though.

I will start doing more debugging based on this new patch now to ensure that the bug is gone now.

On a more design-related note, for the sake of readability and stability, I'd personally opt for implementing toned-down custom sprintf-like function that does exactly what it needs to do and nothing more, since a function like this one requires perfect alignment with the underlying sprintf() in terms of functionality, at the possible expense of stability and integrity issues like we see here. For instance, width and precision are currently overflowable, resulting in either a minus sign appearing in the resulant format string given to sprintf() (width and precision are signed integers), or completely overflowing it (ie. (uint64_t)18446744073709551617 == 1 ). Considering the latter example, how do we know sprintf uses the same logic?

Guido
msg232692 - (view) Author: Guido Vranken (Guido) Date: 2014-12-16 00:39
I'd also like to add that, although I agree with Guido van Rossum that the likelihood of even triggering this bug in a general programming context is low, there are two buffer overflows at play here (one stack-based and one heap-based), and given an adversary's control over the format and vargs parameters, I'd there is a reasonable likelihood of exploiting it to execute arbitrary code, since the one controlling the parameters has some control as to which bytes end up where outside buffer boundaries.
msg232694 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-12-16 01:09
I'd be much worried about attack scenarios if this function was part of the standard library. But it's not -- the stdlib's % operator uses completely different code. The most common use case is probably to generate error messages from extension modules -- and there the format is almost always a literal in the C code. (An adversary who can load a C extension doesn't need this exploit.)
msg232737 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-16 12:31
Here is updated patch for 2.7 (backported tests from 3.5), patches for 3.2 and 
3.3.
msg234846 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-27 19:40
Georg, what is your word as release manager of 3.2/3.3? I would commit the patch in 2.7 if there are no objections.
msg234847 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2015-01-27 20:01
It's fine to commit it to both branches.
msg234848 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-01-27 20:33
New changeset e6b9e277fbf4 by Serhiy Storchaka in branch '2.7':
Issue #23055: Fixed a buffer overflow in PyUnicode_FromFormatV.  Analysis
https://hg.python.org/cpython/rev/e6b9e277fbf4

New changeset f849f937f78c by Serhiy Storchaka in branch '3.2':
Issue #23055: Fixed a buffer overflow in PyUnicode_FromFormatV.  Analysis
https://hg.python.org/cpython/rev/f849f937f78c

New changeset 6caed177a028 by Serhiy Storchaka in branch '3.3':
Issue #23055: Fixed a buffer overflow in PyUnicode_FromFormatV.  Analysis
https://hg.python.org/cpython/rev/6caed177a028
msg235051 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-01-30 19:44
I think in 2.7 there's a slight problem since e6b9e277fbf4:

[1/1] test_unicode
Debug memory block at address p=0x7f4ebba3fae0: API 'o'
    100 bytes originally requested
    The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected.
    The 8 pad bytes at tail=0x7f4ebba3fb44 are not all FORBIDDENBYTE (0xfb):
        at tail+0: 0x00 *** OUCH
        at tail+1: 0xfb
        at tail+2: 0xfb
        at tail+3: 0xfb
        at tail+4: 0xfb
        at tail+5: 0xfb
        at tail+6: 0xfb
        at tail+7: 0xfb
    The block was made by call #659785 to debug malloc/realloc.
    Data at p: 20 20 20 20 20 20 20 20 ... 20 20 20 20 20 31 32 33
Fatal Python error: bad trailing pad byte
Aborted (core dumped)
msg235055 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-01-30 21:36
New changeset e5d79e6deeb5 by Serhiy Storchaka in branch '2.7':
Issue #23055: Fixed off-by-one error in PyUnicode_FromFormatV.
https://hg.python.org/cpython/rev/e5d79e6deeb5
msg235056 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-30 21:39
Backported tests exposed off-by-one error in PyUnicode_FromFormatV. This error was fixed in 3.x in changeset ac768c8e13ac (issue7228).
msg235057 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-30 21:40
Thank you Stefan for pointing on tests failure.
msg235059 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-01-30 22:02
I think I still get a problem in 2.7:

[1/1] test_unicode
==23430== Invalid read of size 1
==23430==    at 0x484541: PyUnicodeUCS2_FromFormatV (unicodeobject.c:736)
==23430==    by 0x485C75: PyUnicodeUCS2_FromFormat (unicodeobject.c:1083)


736         for (f = format; *f; f++) {
(gdb) p format
$1 = 0x71d45f4 "%"
(gdb) p f
$2 = 0x71d45f6 ""


So format=="%", first f++ happens at 738, second f++ happens implicitly
at the end of the for loop. The *f condition in 736 is then an invalid
read.


Perhaps use while for the outer loop and a break? (It's just my
personal preference, I sometimes get confused by incrementing
at the end and also inside for loops.)
msg235060 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-30 22:20
Yes, I think following patch will help.
msg235062 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-01-30 23:01
issue23055_2.patch looks good.
msg235063 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-01-30 23:17
New changeset 245c9f372a34 by Serhiy Storchaka in branch '2.7':
Issue #23055: Fixed read-past-the-end error in PyUnicode_FromFormatV.
https://hg.python.org/cpython/rev/245c9f372a34

New changeset 9fe1d861f486 by Serhiy Storchaka in branch '3.2':
Issue #23055: Fixed read-past-the-end error in PyUnicode_FromFormatV.
https://hg.python.org/cpython/rev/9fe1d861f486
History
Date User Action Args
2015-02-01 11:13:49Arfreversetnosy: + Arfrever
2015-01-31 07:51:38serhiy.storchakasetstatus: open -> closed
2015-01-30 23:17:13python-devsetmessages: + msg235063
2015-01-30 23:01:32skrahsetmessages: + msg235062
2015-01-30 22:20:16serhiy.storchakasetfiles: + issue23055_2.patch

messages: + msg235060
2015-01-30 22:02:44skrahsetmessages: + msg235059
2015-01-30 21:40:24serhiy.storchakasetmessages: + msg235057
2015-01-30 21:39:15serhiy.storchakasetmessages: + msg235056
2015-01-30 21:36:05python-devsetmessages: + msg235055
2015-01-30 20:22:55serhiy.storchakasetstatus: closed -> open
2015-01-30 19:44:25skrahsetnosy: + skrah
messages: + msg235051
2015-01-27 20:46:13serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2015-01-27 20:33:44python-devsetnosy: + python-dev
messages: + msg234848
2015-01-27 20:01:25georg.brandlsetmessages: + msg234847
2015-01-27 19:40:15serhiy.storchakasetmessages: + msg234846
2014-12-16 12:31:37serhiy.storchakasetfiles: + issue23055-2.7-2.patch, issue23055-3.2.patch, issue23055-3.3.patch

messages: + msg232737
2014-12-16 01:09:08gvanrossumsetmessages: + msg232694
2014-12-16 00:39:58Guidosetmessages: + msg232692
2014-12-16 00:25:11Guidosetnosy: + Guido
messages: + msg232691
2014-12-15 23:46:59gvanrossumsetmessages: + msg232690
versions: + Python 3.2, Python 3.3
2014-12-15 23:38:37serhiy.storchakasetfiles: + issue23055-2.7.patch

components: + Interpreter Core
versions: - Python 3.4, Python 3.5
nosy: + serhiy.storchaka, georg.brandl, vstinner

messages: + msg232689
stage: patch review
2014-12-15 17:18:35gvanrossumcreate