classification
Title: integer overflow in unicodedata.normalize
Type: crash Stage: resolved
Components: Unicode Versions: Python 3.5, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, benjamin.peterson, ezio.melotti, pkt, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2015-02-01 13:57 by pkt, last changed 2015-03-03 19:45 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
poc_unidata_normalize.py pkt, 2015-02-01 13:57
Messages (8)
msg235175 - (view) Author: paul (pkt) Date: 2015-02-01 13:57
# Bug
# ---
# 
# static PyObject*
# unicodedata_normalize(PyObject *self, PyObject *args)
# {
#     ...
#     if (strcmp(form, "NFKC") == 0) {
#         if (is_normalized(self, input, 1, 1)) {
#             Py_INCREF(input);
#             return input;
#         }
#         return nfc_nfkc(self, input, 1);
# 
# We need to pass the is_normalized() check (repeated \xa0 char takes care of 
# that). nfc_nfkc calls:
# 
# static PyObject*
# nfd_nfkd(PyObject *self, PyObject *input, int k)
# {
#     ...
#     Py_ssize_t space, isize;
#     ...
#     isize = PyUnicode_GET_LENGTH(input);
#     /* Overallocate at most 10 characters. */
#     space = (isize > 10 ? 10 : isize) + isize;
#     osize = space;
# 1   output = PyMem_Malloc(space * sizeof(Py_UCS4));
# 
# 1. if isize=2^30, then space=2^30+10, so space*sizeof(Py_UCS4)=(2^30+10)*4 ==
#    40 (modulo 2^32), so PyMem_Malloc allocates buffer too small to hold the
#    result.
# 
# Crash
# -----
# 
# nfd_nfkd (self=<module at remote 0x4056e574>, input='...', k=1) at /home/p/Python-3.4.1/Modules/unicodedata.c:552
# 552         stackptr = 0;
# (gdb) n
# 553         isize = PyUnicode_GET_LENGTH(input);
# (gdb) n
# 555         space = (isize > 10 ? 10 : isize) + isize;
# (gdb) n
# 556         osize = space;
# (gdb) n
# 557         output = PyMem_Malloc(space * sizeof(Py_UCS4));
# (gdb) print space
# $9 = 1073741834
# (gdb) print space*4
# $10 = 40
# (gdb) c
# Continuing.
#  
# Program received signal SIGSEGV, Segmentation fault.
# 0x40579cbb in nfd_nfkd (self=<module at remote 0x4056e574>, input='', k=1) at /home/p/Python-3.4.1/Modules/unicodedata.c:614
# 614                     output[o++] = code;
# 
# OS info
# -------
# 
# % ./python -V
# Python 3.4.1
#  
# % uname -a
# Linux ubuntu 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 15:31:16 UTC 2013 i686 i686 i386 GNU/Linux
 
 
import unicodedata as ud
s="\xa0"*(2**30)
ud.normalize("NFKC", s)
msg237058 - (view) Author: Roundup Robot (python-dev) Date: 2015-03-02 16:21
New changeset 84025a32fa2b by Benjamin Peterson in branch '3.3':
fix possible overflow bugs in unicodedata (closes #23367)
https://hg.python.org/cpython/rev/84025a32fa2b

New changeset 90f960e79c9e by Benjamin Peterson in branch '3.4':
merge 3.3 (#23367)
https://hg.python.org/cpython/rev/90f960e79c9e

New changeset 93244000efea by Benjamin Peterson in branch 'default':
merge 3.4 (#23367)
https://hg.python.org/cpython/rev/93244000efea

New changeset 3019effc44f2 by Benjamin Peterson in branch '2.7':
fix possible overflow bugs in unicodedata (closes #23367)
https://hg.python.org/cpython/rev/3019effc44f2
msg237062 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-02 16:58
Actually integer overflow in the line

    space = (isize > 10 ? 10 : isize) + isize;

is not possible.

Integer overflows in PyMem_Malloc were fixed in issue23446.
msg237068 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2015-03-02 17:58
Why can't (isize > 10 ? 10 : isize) + isize overflow?
msg237077 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-02 19:24
Because isize is the size of real PyUnicode object. It's maximal value is PY_SSIZE_T_MAX - sizeof(PyASCIIObject) - 1.
msg237080 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-03-02 20:17
Well, the test doesn't hurt.
msg237084 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2015-03-02 20:54
True, but that could change and is not true in Python 2. I suppose we
could revert the change and add a static assertion.
On Mon, Mar 2, 2015, at 14:24, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
> Because isize is the size of real PyUnicode object. It's maximal value is
> PY_SSIZE_T_MAX - sizeof(PyASCIIObject) - 1.
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue23367>
> _______________________________________
msg237158 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-03 19:45
The test doesn't hurt.
History
Date User Action Args
2015-03-03 19:45:10serhiy.storchakasetmessages: + msg237158
2015-03-03 05:13:01Arfreversetversions: + Python 2.7, Python 3.3, Python 3.5
2015-03-02 20:54:26benjamin.petersonsetmessages: + msg237084
2015-03-02 20:17:09vstinnersetmessages: + msg237080
2015-03-02 19:24:18serhiy.storchakasetmessages: + msg237077
2015-03-02 17:58:09benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg237068
2015-03-02 16:58:28serhiy.storchakasetmessages: + msg237062
2015-03-02 16:21:38python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg237058

resolution: fixed
stage: resolved
2015-03-02 08:21:26ezio.melottisetnosy: + ezio.melotti, vstinner, serhiy.storchaka
components: + Unicode
2015-02-01 21:17:48Arfreversetnosy: + Arfrever
2015-02-01 13:57:15pktcreate