Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integer overflow in unicodedata.normalize #67556

Closed
pkt mannequin opened this issue Feb 1, 2015 · 8 comments
Closed

integer overflow in unicodedata.normalize #67556

pkt mannequin opened this issue Feb 1, 2015 · 8 comments
Labels
topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@pkt
Copy link
Mannequin

pkt mannequin commented Feb 1, 2015

BPO 23367
Nosy @vstinner, @benjaminp, @ezio-melotti, @serhiy-storchaka
Files
  • poc_unidata_normalize.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-03-02.16:21:38.800>
    created_at = <Date 2015-02-01.13:57:15.250>
    labels = ['expert-unicode', 'type-crash']
    title = 'integer overflow in unicodedata.normalize'
    updated_at = <Date 2015-03-03.19:45:10.819>
    user = 'https://bugs.python.org/pkt'

    bugs.python.org fields:

    activity = <Date 2015-03-03.19:45:10.819>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-03-02.16:21:38.800>
    closer = 'python-dev'
    components = ['Unicode']
    creation = <Date 2015-02-01.13:57:15.250>
    creator = 'pkt'
    dependencies = []
    files = ['37966']
    hgrepos = []
    issue_num = 23367
    keywords = []
    message_count = 8.0
    messages = ['235175', '237058', '237062', '237068', '237077', '237080', '237084', '237158']
    nosy_count = 7.0
    nosy_names = ['vstinner', 'benjamin.peterson', 'ezio.melotti', 'Arfrever', 'python-dev', 'serhiy.storchaka', 'pkt']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue23367'
    versions = ['Python 2.7', 'Python 3.3', 'Python 3.4', 'Python 3.5']

    @pkt
    Copy link
    Mannequin Author

    pkt mannequin commented Feb 1, 2015

    # Bug
    # ---
    #
    # static PyObject*
    # unicodedata_normalize(PyObject *self, PyObject *args)
    # {
    # ...
    # if (strcmp(form, "NFKC") == 0) {
    # if (is_normalized(self, input, 1, 1)) {
    # Py_INCREF(input);
    # return input;
    # }
    # return nfc_nfkc(self, input, 1);
    #
    # We need to pass the is_normalized() check (repeated \xa0 char takes care of
    # that). nfc_nfkc calls:
    #
    # static PyObject*
    # nfd_nfkd(PyObject *self, PyObject *input, int k)
    # {
    # ...
    # Py_ssize_t space, isize;
    # ...
    # isize = PyUnicode_GET_LENGTH(input);
    # /* Overallocate at most 10 characters. */
    # space = (isize > 10 ? 10 : isize) + isize;
    # osize = space;
    # 1 output = PyMem_Malloc(space * sizeof(Py_UCS4));
    #
    # 1. if isize=2^30, then space=2^30+10, so space*sizeof(Py_UCS4)=(2^30+10)*4 ==
    # 40 (modulo 2^32), so PyMem_Malloc allocates buffer too small to hold the
    # result.
    #
    # Crash
    # -----
    #
    # nfd_nfkd (self=<module at remote 0x4056e574>, input='...', k=1) at /home/p/Python-3.4.1/Modules/unicodedata.c:552
    # 552 stackptr = 0;
    # (gdb) n
    # 553 isize = PyUnicode_GET_LENGTH(input);
    # (gdb) n
    # 555 space = (isize > 10 ? 10 : isize) + isize;
    # (gdb) n
    # 556 osize = space;
    # (gdb) n
    # 557 output = PyMem_Malloc(space * sizeof(Py_UCS4));
    # (gdb) print space
    # $9 = 1073741834
    # (gdb) print space*4
    # $10 = 40
    # (gdb) c
    # Continuing.
    #
    # Program received signal SIGSEGV, Segmentation fault.
    # 0x40579cbb in nfd_nfkd (self=<module at remote 0x4056e574>, input='', k=1) at /home/p/Python-3.4.1/Modules/unicodedata.c:614
    # 614 output[o++] = code;
    #
    # OS info
    # -------

    # 
    # % ./python -V
    # Python 3.4.1
    #  
    # % uname -a
    # Linux ubuntu 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 15:31:16 UTC 2013 i686 i686 i386 GNU/Linux
     
     
    import unicodedata as ud
    s="\xa0"*(2**30)
    ud.normalize("NFKC", s)

    @pkt pkt mannequin added the type-crash A hard crash of the interpreter, possibly with a core dump label Feb 1, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 2, 2015

    New changeset 84025a32fa2b by Benjamin Peterson in branch '3.3':
    fix possible overflow bugs in unicodedata (closes bpo-23367)
    https://hg.python.org/cpython/rev/84025a32fa2b

    New changeset 90f960e79c9e by Benjamin Peterson in branch '3.4':
    merge 3.3 (bpo-23367)
    https://hg.python.org/cpython/rev/90f960e79c9e

    New changeset 93244000efea by Benjamin Peterson in branch 'default':
    merge 3.4 (bpo-23367)
    https://hg.python.org/cpython/rev/93244000efea

    New changeset 3019effc44f2 by Benjamin Peterson in branch '2.7':
    fix possible overflow bugs in unicodedata (closes bpo-23367)
    https://hg.python.org/cpython/rev/3019effc44f2

    @python-dev python-dev mannequin closed this as completed Mar 2, 2015
    @serhiy-storchaka
    Copy link
    Member

    Actually integer overflow in the line

        space = (isize > 10 ? 10 : isize) + isize;

    is not possible.

    Integer overflows in PyMem_Malloc were fixed in bpo-23446.

    @benjaminp
    Copy link
    Contributor

    Why can't (isize > 10 ? 10 : isize) + isize overflow?

    @serhiy-storchaka
    Copy link
    Member

    Because isize is the size of real PyUnicode object. It's maximal value is PY_SSIZE_T_MAX - sizeof(PyASCIIObject) - 1.

    @vstinner
    Copy link
    Member

    vstinner commented Mar 2, 2015

    Well, the test doesn't hurt.

    @benjaminp
    Copy link
    Contributor

    True, but that could change and is not true in Python 2. I suppose we
    could revert the change and add a static assertion.
    On Mon, Mar 2, 2015, at 14:24, Serhiy Storchaka wrote:

    Serhiy Storchaka added the comment:

    Because isize is the size of real PyUnicode object. It's maximal value is
    PY_SSIZE_T_MAX - sizeof(PyASCIIObject) - 1.

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue23367\>


    @serhiy-storchaka
    Copy link
    Member

    The test doesn't hurt.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants