classification
Title: Integer overflow in case_operation
Type: crash Stage: resolved
Components: Unicode Versions: Python 3.5, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, benjamin.peterson, ezio.melotti, haypo, pkt, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-10-15 14:50 by pkt, last changed 2015-11-09 10:20 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
poc_case_op.py pkt, 2014-10-15 14:50 poc
test_case_operation_overflow_memory_error.patch serhiy.storchaka, 2015-04-03 12:46 review
Messages (24)
msg229455 - (view) Author: paul (pkt) Date: 2014-10-15 14:50
Crashes python 3.4.1. 

# Objects\unicodeobject.c
# 
# static PyObject *
# case_operation(PyObject *self,
#                Py_ssize_t (*perform)(int, void *, Py_ssize_t, Py_UCS4 *, Py_UCS4 *))
# {
#     PyObject *res = NULL;
#     Py_ssize_t length, newlength = 0;
#     int kind, outkind;
#     (...)
# 1   length = PyUnicode_GET_LENGTH(self);
# 2   tmp = PyMem_MALLOC(sizeof(Py_UCS4) * 3 * length);
#     (...)
# 3   newlength = perform(kind, data, length, tmp, &maxchar);
# 
# 1. there are no safety checks 
# 2. 12*length overflows
# 3. perform() writes to tmp buffer, which is too small to hold the result
msg229467 - (view) Author: Roundup Robot (python-dev) Date: 2014-10-15 15:49
New changeset 449b1f427cc7 by Benjamin Peterson in branch '3.3':
fix integer overflow in unicode case operations (closes #22643)
https://hg.python.org/cpython/rev/449b1f427cc7

New changeset 570e70252d5d by Benjamin Peterson in branch '3.4':
merge 3.3 (#22643)
https://hg.python.org/cpython/rev/570e70252d5d

New changeset c2980ec10a4c by Benjamin Peterson in branch 'default':
merge 3.4 (#22643)
https://hg.python.org/cpython/rev/c2980ec10a4c
msg229470 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-10-15 16:10
Benjamin, could you please first propose a patch for review instead of commiting directly your change? Especially for security related changes.

+    if (length > PY_SSIZE_T_MAX / 3 ||
+        length > PY_SIZE_MAX / (3 * sizeof(Py_UCS4))) {
+        PyErr_SetString(PyExc_OverflowError, "string is too long");
+        return NULL;
+    }
     tmp = PyMem_MALLOC(sizeof(Py_UCS4) * 3 * length);

PyMem_MALLOC() returns NULL if the length is larger than PY_SSIZE_T_MAX, so the overflow check doesn't look correct. The overflow check can be replaced with:

    if ((size_t)length > PY_SSIZE_T_MAX / (3 * sizeof(Py_UCS4))) ...
msg229471 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-10-15 16:11
Other changesets related to this issue:

changeset:   93071:6a91e616485a
branch:      3.3
parent:      93068:449b1f427cc7
user:        Benjamin Peterson <benjamin@python.org>
date:        Wed Oct 15 11:51:05 2014 -0400
files:       Objects/unicodeobject.c
description:
make sure length is unsigned

(followed by merged into 3.4 and 3.5)
msg229472 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-10-15 16:15
Cool, I forgot about that.

On Wed, Oct 15, 2014, at 12:11, STINNER Victor wrote:
> 
> STINNER Victor added the comment:
> 
> Other changesets related to this issue:
> 
> changeset:   93071:6a91e616485a
> branch:      3.3
> parent:      93068:449b1f427cc7
> user:        Benjamin Peterson <benjamin@python.org>
> date:        Wed Oct 15 11:51:05 2014 -0400
> files:       Objects/unicodeobject.c
> description:
> make sure length is unsigned
> 
> (followed by merged into 3.4 and 3.5)
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue22643>
> _______________________________________
msg229473 - (view) Author: Roundup Robot (python-dev) Date: 2014-10-15 16:21
New changeset f963cc1f96cf by Benjamin Peterson in branch '3.3':
it suffices to check for PY_SSIZE_T_MAX overflow (#22643)
https://hg.python.org/cpython/rev/f963cc1f96cf

New changeset 8195d48a5c43 by Benjamin Peterson in branch 'default':
merge 3.4 (closes #22643)
https://hg.python.org/cpython/rev/8195d48a5c43
msg229476 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-15 17:19
I think this is a place where _PyUnicodeWriter would be appropriate (of course this is different issue).
msg229478 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-15 17:29
The test should be decorated with the bigmemtest decorator. And I think that condition sys.maxsize < 2**32 would be more robust.
msg229479 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-10-15 17:36
It's only 341 MB.

On Wed, Oct 15, 2014, at 13:29, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
> The test should be decorated with the bigmemtest decorator. And I think
> that condition sys.maxsize < 2**32 would be more robust.
> 
> ----------
> status: closed -> open
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue22643>
> _______________________________________
msg229480 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-15 17:37
And this test is CPython only. It relies on specific implementation detail. After changing current implementation (which inefficiently uses memory) this test will be dropped.
msg229481 - (view) Author: Roundup Robot (python-dev) Date: 2014-10-15 17:41
New changeset 33290d0dd946 by Benjamin Peterson in branch '3.4':
merge 3.3 (#22643)
https://hg.python.org/cpython/rev/33290d0dd946

New changeset ffabb674140c by Benjamin Peterson in branch 'default':
merge 3.4 (#22643)
https://hg.python.org/cpython/rev/ffabb674140c
msg229482 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-10-15 17:43
Since the memory requirement is less than 500MB, I don't think it needs a bigmemtest decorator.
msg229483 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-15 17:46
> It's only 341 MB.

It's 2**32//12*2 ~ 683 MiB for original string (but I'm not sure that non-
ASCII string is needed for this test) + 2**32 = 4GiB if the test will fail for 
some reason. Some buildbots AFAIK have memory less than 683 MiB.
msg229484 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-10-15 17:48
It's Latin 1, so the chars only use one byte:

>>> sys.getsizeof("üü") - sys.getsizeof("ü")
1

On Wed, Oct 15, 2014, at 13:46, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
> > It's only 341 MB.
> 
> It's 2**32//12*2 ~ 683 MiB for original string (but I'm not sure that
> non-
> ASCII string is needed for this test) + 2**32 = 4GiB if the test will
> fail for 
> some reason. Some buildbots AFAIK have memory less than 683 MiB.
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue22643>
> _______________________________________
msg229486 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-10-15 17:51
More explicitly:

>>> sys.getsizeof("ü"*(2**32//12 + 1))//1024//1024
341
msg229551 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * Date: 2014-10-16 20:28
Summary of commits:
  3.3:
    449b1f427cc7
    6a91e616485a
    f963cc1f96cf
    cda907a02a80
    b4a6be12a4bc
  3.4:
    570e70252d5d
    a6d0b1275d4b
    2a25076c50ad
    33290d0dd946
    2f90ea9e60ef
  3.5:
    c2980ec10a4c
    d0f8f242531b
    8195d48a5c43
    ffabb674140c
    030fda7b1de8
msg239956 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-04-03 06:19
http://buildbot.python.org/all/builders/PPC64%20AIX%203.x/builds/3466/steps/test/logs/stdio
======================================================================
ERROR: test_case_operation_overflow (test.test_unicode.UnicodeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/shager/cpython-buildarea/3.x.edelsohn-aix-ppc64/build/Lib/test/test_unicode.py", line 852, in test_case_operation_overflow
    self.assertRaises(OverflowError, ("�"*(2**32//12 + 1)).upper)
MemoryError

----------------------------------------------------------------------
msg239959 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-04-03 08:53
>     self.assertRaises(OverflowError, ("�"*(2**32//12 + 1)).upper)
> MemoryError

Hum, even with the PEP 393, this string is still large: 682 MB.

$ python3
Python 3.4.1 (default, Nov  3 2014, 14:38:10) 
>>> sys.getsizeof("�"*(2**32//12 + 1)) / 1024.**2
682.6667385101318

I guess that Serhiy suggests to put a @bigmemtest decorator on test_case_operation_overflow().

The test expects that large_string.upper() raises immediatly an OverflowError, so I understand that a new string is not created.

Hum, it's strange to have an OverflowError here. Integer overflow on memory allocation usually raises MemoryError, not OverflowError. It looks like unicodeobject.c is not consistent here.
msg239960 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-04-03 09:06
It's strange to have "�" here. Should be "ü".
msg239970 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-04-03 11:37
> It's strange to have "�" here. Should be "ü".

Oh right, I saw "�" in the bug tracker and in the buildbot output. But
it's a web browser issue, if you download the raw buildbout output,
you will see "ü". So the string takes less memory (1 byte per
character, not 2).
msg239985 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-04-03 12:46
Perhaps adding bigmemtest is not worth if memory requirements are less than 1 GiB. But I would separate the creating of long tested string and the calling of case converting method, so we will sure what operation is failed. Proposed patch skips the test if can't create needed string and clean locals, so failing test_case_operation_overflow no longer causes multiple MemoryErrors in following tests.
msg239986 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2015-04-03 12:53
> Perhaps adding bigmemtest is not worth if memory requirements are less than 1 GiB.

What is the current limit if you don't pass -M option to regrtest? In @bigimemtest, I see "maxsize = 5147". Is it a number of bytes? 5 kB of memory?

IMO it's interested to use @bigmemtest if a test uses 100 MB of memory or more, but regrtest should use tests using less than 500 MB by default (or maybe less than 1 GB by default).
msg239987 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-04-03 13:02
bigimem tests are ran with dummy size = 5147 if dry_run = True (default) and skipped if dry_run = False. Almost always you need dry_run = False.

The -M option doesn't work with less than 1 GiB.

In this case likely the buildbot had enough memory at the start, but then some memory leaked in tests (perhaps mainly in tracebacks of failed tests).
msg254272 - (view) Author: Roundup Robot (python-dev) Date: 2015-11-07 14:56
New changeset 5fae49ef94fd by Serhiy Storchaka in branch '3.4':
Issue #22643: Skip test_case_operation_overflow on computers with low memory.
https://hg.python.org/cpython/rev/5fae49ef94fd

New changeset 6b00bee218ff by Serhiy Storchaka in branch '3.5':
Issue #22643: Skip test_case_operation_overflow on computers with low memory.
https://hg.python.org/cpython/rev/6b00bee218ff

New changeset b1c5949a3af4 by Serhiy Storchaka in branch 'default':
Issue #22643: Skip test_case_operation_overflow on computers with low memory.
https://hg.python.org/cpython/rev/b1c5949a3af4
History
Date User Action Args
2015-11-09 10:20:48serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2015-11-07 14:56:03python-devsetmessages: + msg254272
2015-04-03 13:02:42serhiy.storchakasetmessages: + msg239987
2015-04-03 12:53:33hayposetmessages: + msg239986
2015-04-03 12:46:44serhiy.storchakasetfiles: + test_case_operation_overflow_memory_error.patch
keywords: + patch
stage: resolved -> patch review
versions: - Python 3.3
2015-04-03 12:46:07serhiy.storchakasetmessages: + msg239985
2015-04-03 11:37:10hayposetmessages: + msg239970
2015-04-03 09:06:09serhiy.storchakasetmessages: + msg239960
2015-04-03 08:53:54hayposetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg239959
2015-04-03 06:19:26serhiy.storchakasetmessages: + msg239956
2014-10-16 20:28:47Arfreversetnosy: + Arfrever

messages: + msg229551
versions: + Python 3.3, Python 3.5
2014-10-15 17:51:10benjamin.petersonsetmessages: + msg229486
2014-10-15 17:48:59benjamin.petersonsetmessages: + msg229484
2014-10-15 17:46:55serhiy.storchakasetmessages: + msg229483
2014-10-15 17:43:45benjamin.petersonsetstatus: open -> closed

messages: + msg229482
2014-10-15 17:41:10python-devsetmessages: + msg229481
2014-10-15 17:37:30serhiy.storchakasetmessages: + msg229480
2014-10-15 17:36:17benjamin.petersonsetmessages: + msg229479
2014-10-15 17:29:39serhiy.storchakasetstatus: closed -> open

messages: + msg229478
2014-10-15 17:19:20serhiy.storchakasetmessages: + msg229476
2014-10-15 16:21:27python-devsetstatus: open -> closed
resolution: fixed
messages: + msg229473
2014-10-15 16:15:59benjamin.petersonsetmessages: + msg229472
2014-10-15 16:11:15hayposetmessages: + msg229471
2014-10-15 16:10:29hayposetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg229470
2014-10-15 15:49:51python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg229467

resolution: fixed
stage: resolved
2014-10-15 15:17:15ezio.melottisetnosy: + benjamin.peterson, serhiy.storchaka
type: security -> crash
2014-10-15 15:15:26hayposetnosy: + ezio.melotti
components: + Unicode
2014-10-15 15:15:21hayposetnosy: + haypo
2014-10-15 14:50:30pktcreate