integer overflow in unicodedata.normalize #67556

pkt · 2015-02-01T13:57:15Z

BPO	23367
Nosy	@vstinner, @benjaminp, @ezio-melotti, @serhiy-storchaka
Files	poc_unidata_normalize.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-03-02.16:21:38.800>
created_at = <Date 2015-02-01.13:57:15.250>
labels = ['expert-unicode', 'type-crash']
title = 'integer overflow in unicodedata.normalize'
updated_at = <Date 2015-03-03.19:45:10.819>
user = 'https://bugs.python.org/pkt'

bugs.python.org fields:

activity = <Date 2015-03-03.19:45:10.819>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2015-03-02.16:21:38.800>
closer = 'python-dev'
components = ['Unicode']
creation = <Date 2015-02-01.13:57:15.250>
creator = 'pkt'
dependencies = []
files = ['37966']
hgrepos = []
issue_num = 23367
keywords = []
message_count = 8.0
messages = ['235175', '237058', '237062', '237068', '237077', '237080', '237084', '237158']
nosy_count = 7.0
nosy_names = ['vstinner', 'benjamin.peterson', 'ezio.melotti', 'Arfrever', 'python-dev', 'serhiy.storchaka', 'pkt']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue23367'
versions = ['Python 2.7', 'Python 3.3', 'Python 3.4', 'Python 3.5']

pkt · 2015-02-01T13:57:15Z

# Bug
# ---
#
# static PyObject*
# unicodedata_normalize(PyObject *self, PyObject *args)
# {
# ...
# if (strcmp(form, "NFKC") == 0) {
# if (is_normalized(self, input, 1, 1)) {
# Py_INCREF(input);
# return input;
# }
# return nfc_nfkc(self, input, 1);
#
# We need to pass the is_normalized() check (repeated \xa0 char takes care of
# that). nfc_nfkc calls:
#
# static PyObject*
# nfd_nfkd(PyObject *self, PyObject *input, int k)
# {
# ...
# Py_ssize_t space, isize;
# ...
# isize = PyUnicode_GET_LENGTH(input);
# /* Overallocate at most 10 characters. */
# space = (isize > 10 ? 10 : isize) + isize;
# osize = space;
# 1 output = PyMem_Malloc(space * sizeof(Py_UCS4));
#
# 1. if isize=2^30, then space=2^30+10, so space*sizeof(Py_UCS4)=(2^30+10)*4 ==
# 40 (modulo 2^32), so PyMem_Malloc allocates buffer too small to hold the
# result.
#
# Crash
# -----
#
# nfd_nfkd (self=<module at remote 0x4056e574>, input='...', k=1) at /home/p/Python-3.4.1/Modules/unicodedata.c:552
# 552 stackptr = 0;
# (gdb) n
# 553 isize = PyUnicode_GET_LENGTH(input);
# (gdb) n
# 555 space = (isize > 10 ? 10 : isize) + isize;
# (gdb) n
# 556 osize = space;
# (gdb) n
# 557 output = PyMem_Malloc(space * sizeof(Py_UCS4));
# (gdb) print space
# $9 = 1073741834
# (gdb) print space*4
# $10 = 40
# (gdb) c
# Continuing.
#
# Program received signal SIGSEGV, Segmentation fault.
# 0x40579cbb in nfd_nfkd (self=<module at remote 0x4056e574>, input='', k=1) at /home/p/Python-3.4.1/Modules/unicodedata.c:614
# 614 output[o++] = code;
#
# OS info
# -------

# 
# % ./python -V
# Python 3.4.1
#  
# % uname -a
# Linux ubuntu 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 15:31:16 UTC 2013 i686 i686 i386 GNU/Linux
 
 
import unicodedata as ud
s="\xa0"*(2**30)
ud.normalize("NFKC", s)

python-dev · 2015-03-02T16:21:39Z

New changeset 84025a32fa2b by Benjamin Peterson in branch '3.3':
fix possible overflow bugs in unicodedata (closes bpo-23367)
https://hg.python.org/cpython/rev/84025a32fa2b

New changeset 90f960e79c9e by Benjamin Peterson in branch '3.4':
merge 3.3 (bpo-23367)
https://hg.python.org/cpython/rev/90f960e79c9e

New changeset 93244000efea by Benjamin Peterson in branch 'default':
merge 3.4 (bpo-23367)
https://hg.python.org/cpython/rev/93244000efea

New changeset 3019effc44f2 by Benjamin Peterson in branch '2.7':
fix possible overflow bugs in unicodedata (closes bpo-23367)
https://hg.python.org/cpython/rev/3019effc44f2

serhiy-storchaka · 2015-03-02T16:58:28Z

Actually integer overflow in the line

    space = (isize > 10 ? 10 : isize) + isize;

is not possible.

Integer overflows in PyMem_Malloc were fixed in bpo-23446.

benjaminp · 2015-03-02T17:58:09Z

Why can't (isize > 10 ? 10 : isize) + isize overflow?

serhiy-storchaka · 2015-03-02T19:24:18Z

Because isize is the size of real PyUnicode object. It's maximal value is PY_SSIZE_T_MAX - sizeof(PyASCIIObject) - 1.

vstinner · 2015-03-02T20:17:10Z

Well, the test doesn't hurt.

benjaminp · 2015-03-02T20:54:27Z

True, but that could change and is not true in Python 2. I suppose we
could revert the change and add a static assertion.
On Mon, Mar 2, 2015, at 14:24, Serhiy Storchaka wrote:

Serhiy Storchaka added the comment:

Because isize is the size of real PyUnicode object. It's maximal value is
PY_SSIZE_T_MAX - sizeof(PyASCIIObject) - 1.

----------

Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue23367\>

serhiy-storchaka · 2015-03-03T19:45:11Z

The test doesn't hurt.

pkt mannequin added the type-crash A hard crash of the interpreter, possibly with a core dump label Feb 1, 2015

ezio-melotti added the topic-unicode label Mar 2, 2015

python-dev mannequin closed this as completed Mar 2, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integer overflow in unicodedata.normalize #67556

integer overflow in unicodedata.normalize #67556

pkt mannequin commented Feb 1, 2015

pkt mannequin commented Feb 1, 2015

python-dev mannequin commented Mar 2, 2015

serhiy-storchaka commented Mar 2, 2015

benjaminp commented Mar 2, 2015

serhiy-storchaka commented Mar 2, 2015

vstinner commented Mar 2, 2015

benjaminp commented Mar 2, 2015

serhiy-storchaka commented Mar 3, 2015

integer overflow in unicodedata.normalize #67556

integer overflow in unicodedata.normalize #67556

Comments

pkt mannequin commented Feb 1, 2015

pkt mannequin commented Feb 1, 2015

python-dev mannequin commented Mar 2, 2015

serhiy-storchaka commented Mar 2, 2015

benjaminp commented Mar 2, 2015

serhiy-storchaka commented Mar 2, 2015

vstinner commented Mar 2, 2015

benjaminp commented Mar 2, 2015

serhiy-storchaka commented Mar 3, 2015