Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CVE-2012-2135: Vulnerability in the utf-16 decoder after error handling #58784

Closed
serhiy-storchaka opened this issue Apr 14, 2012 · 26 comments
Closed
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode type-security A security issue

Comments

@serhiy-storchaka
Copy link
Member

BPO 14579
Nosy @loewis, @birkenfeld, @pitrou, @vstinner, @benjaminp, @ezio-melotti, @asvetlov, @serhiy-storchaka
Files
  • utf16_update_after_error.patch: Patch for Python 3.3
  • utf16_update_after_error-3.2.patch: Patch for Python 3.2
  • utf16crasher.py
  • utf16_error_handling-3.2.patch: Patch for Python 3.2. Сorrecting all bugs.
  • utf16_update_after_error-3.2.patch: Patch for Python 3.2. Сorrecting only critical bug.
  • utf16_error_handling-3.2_3.patch
  • utf16_error_handling-3.2_4.patch: Patch for Python 3.2
  • utf16_error_handling-2.7.patch: Patch for Python 2.7
  • utf16_error_handling_tests-3.3.patch: Patch for Python 3.3
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-07-20.22:54:34.095>
    created_at = <Date 2012-04-14.18:46:02.989>
    labels = ['type-security', 'interpreter-core', 'expert-unicode']
    title = 'CVE-2012-2135: Vulnerability in the utf-16 decoder after error handling'
    updated_at = <Date 2012-07-21.06:16:09.182>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2012-07-21.06:16:09.182>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-07-20.22:54:34.095>
    closer = 'pitrou'
    components = ['Interpreter Core', 'Unicode']
    creation = <Date 2012-04-14.18:46:02.989>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['25213', '25214', '25276', '25294', '25295', '25352', '25366', '25367', '26424']
    hgrepos = []
    issue_num = 14579
    keywords = ['patch']
    message_count = 26.0
    messages = ['158272', '158743', '158758', '158760', '158810', '158818', '158821', '158869', '159135', '159138', '159212', '159257', '159258', '159266', '159317', '159320', '159325', '159362', '159363', '159411', '165743', '165984', '165985', '165986', '165987', '165994']
    nosy_count = 12.0
    nosy_names = ['loewis', 'georg.brandl', 'pitrou', 'vstinner', 'benjamin.peterson', 'ezio.melotti', 'Arfrever', 'asvetlov', 'Henri.Salo', 'python-dev', 'Huzaifa.Sidhpurwala', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue14579'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3']

    @serhiy-storchaka
    Copy link
    Member Author

    In the utf-16 decoder after calling unicode_decode_call_errorhandler aligned_end is not updated. This may potentially cause data leaks, memory damage, and crash. The bug introduced by implementation of the issue bpo-4868. In a similar situation in the utf-8 decoder aligned_end is updated.

    @serhiy-storchaka serhiy-storchaka added the type-security A security issue label Apr 14, 2012
    @serhiy-storchaka
    Copy link
    Member Author

    There is the crasher and leaker. When Python is not crashing, there is garbage (i.e. leakage of data) at the end of the decoded string. Indeed, I see an English text in some versions of Python.

    There are many other errors in utf-16 decoder (see, for example, b'\xD8\x00\xDC'.decode('utf-16be')). I'm now finishing work on a new decoder, and after that take the bug fixing in 3.2.

    @serhiy-storchaka
    Copy link
    Member Author

    Here is the bugs in the utf-16 decoder:

    1. aligned_end is not updated after calling error handler.

    2. Possible silent reading of one byte over the bytes array limit when decoding of a surrogate pair. b'\xD8\x00\xDC'.decode('utf-16be')

    3. Error handlers receive data without last byte.

    4. After handling truncate data error it is impossible to continue decoding (unlike all the other decoders).

    @serhiy-storchaka serhiy-storchaka changed the title Possible vulnerability in the utf-16 decoder after error handling Vulnerability in the utf-16 decoder after error handling Apr 19, 2012
    @serhiy-storchaka
    Copy link
    Member Author

    The proposed patch will fix only the first of these bugs. The patch in issue bpo-14624 fixes all bugs for Python 3.3. For Python 3.2 soon I will make a patch.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 20, 2012

    [moving from Rietveld back to Roundup]

    On 2012/04/20 11:15:48, storchaka wrote:

    The aligned_end may point outside unicode object,
    if the unicode object was reallocated.

    How so? The aligned_end *never* points into the unicode object:

    q = (unsigned char *)s;
    e = q + size - 1;
    aligned_end = (const unsigned char *) ((size_t) e & ~LONG_PTR_MASK);

    So aligned_end points into s, not into the unicode object.
    So this adjustment is necessary because the *input* may change in the callback,
    not because the output may change. So the comment in decode_utf8_errors seems
    just as wrong.

    Why this is relevant to this issue, is unclear to me, though: the ignore handler
    doesn't modify the input object.

    @serhiy-storchaka
    Copy link
    Member Author

    So this adjustment is necessary because the *input* may change in the callback,
    not because the output may change. So the comment in decode_utf8_errors seems
    just as wrong.

    You're right, and my eyes in a lather. Now I saw it.

    What you have to offer any comment? If someone would correct a comment
    for decode_utf8_errors, I just copied it.

    Why this is relevant to this issue, is unclear to me, though: the ignore handler
    doesn't modify the input object.

    I first got the crash using a custom handler, and then I saw that
    "ignore" handler is enough. Even if the "ignore" handler does not have
    to change the input object, other handlers can do it and this is the
    reason for the crash remains.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 20, 2012

    You're right, and my eyes in a lather. Now I saw it.

    What you have to offer any comment? If someone would correct a comment
    for decode_utf8_errors, I just copied it.

    "might have changed the input object"

    > Why this is relevant to this issue, is unclear to me, though: the ignore handler
    > doesn't modify the input object.

    I first got the crash using a custom handler, and then I saw that
    "ignore" handler is enough. Even if the "ignore" handler does not have
    to change the input object, other handlers can do it and this is the
    reason for the crash remains.

    I agree that the change is necessary. It just does not explain why it
    fixes this issue.

    @serhiy-storchaka
    Copy link
    Member Author

    Here is a minimal patch that corrects all bugs for 3.2. As a side effect, decoding is accelerated by 4-8%.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 24, 2012

    Now I see the problem: make_decode_exception creates a new bytes object in any case, regardless of whether the error handler will update it or not. Therefore, decoding will continue in this new bytes object.

    I think the same issue also applies to the ASCII decoder in 3.3.

    @serhiy-storchaka
    Copy link
    Member Author

    I think the same issue also applies to the ASCII decoder in 3.3.

    No, the ASCII decoder is not affected by this vulnerability. In a loop,
    in which unicode_decode_call_errorhandler is called, do not use any
    cached and not-updatable data.

    @serhiy-storchaka
    Copy link
    Member Author

    Here is a patch, which took into account the Martin suggestions.

    @serhiy-storchaka serhiy-storchaka changed the title Vulnerability in the utf-16 decoder after error handling Possible vulnerability in the utf-16 decoder after error handling Apr 24, 2012
    @serhiy-storchaka serhiy-storchaka changed the title Possible vulnerability in the utf-16 decoder after error handling Vulnerability in the utf-16 decoder after error handling Apr 24, 2012
    @kseifriedredhatcom
    Copy link
    Mannequin

    kseifriedredhatcom mannequin commented Apr 25, 2012

    Please use CVE-2012-2135 for this issue as per http://www.openwall.com/lists/oss-security/2012/04/25/3

    @HuzaifaSidhpurwala
    Copy link
    Mannequin

    HuzaifaSidhpurwala mannequin commented Apr 25, 2012

    I have not tried the patch yet, but modifying the reproducer yields a different crash. This one seems to be a heap-based buffer overflow which is slightly more serious.

    In the reproducer, you just need to replace ascii() with str().

    Again works on python3 only.

    @serhiy-storchaka
    Copy link
    Member Author

    I now write tests and I have a question. Should b'\xd8\x00\x41'.decode('utf-16be', 'replace') to give '\xfffd' or '\xfffd\xfffd'?

    @HenriSalo
    Copy link
    Mannequin

    HenriSalo mannequin commented Apr 25, 2012

    Debian bug-report: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=670389
    Found in versions python3-defaults/3.2.3~rc1-2, python3-defaults/3.1.3-12+squeeze1

    @HenriSalo
    Copy link
    Mannequin

    HenriSalo mannequin commented Apr 25, 2012

    I tested versions 3.1.1, 3.1.2, 3.1.3, 3.1.4 and 3.1.5 and only 3.1.3 crashed with Segmentation fault:

    Program received signal SIGSEGV, Segmentation fault.
    0x00000000004c483a in PyObject_Call (func=0x7ffff7e4d3b0, arg=0x7ffff70fd410, kw=0x0) at Objects/abstract.c:2156
    2156 if ((call = func->ob_type->tp_call) != NULL) {

    (gdb) bt
    #0 0x00000000004c483a in PyObject_Call (func=0x7ffff7e4d3b0, arg=0x7ffff70fd410, kw=0x0) at Objects/abstract.c:2156
    #1 0x000000000045c437 in do_call (f=0x8929b0, throwflag=<value optimized out>) at Python/ceval.c:3982
    #2 call_function (f=0x8929b0, throwflag=<value optimized out>) at Python/ceval.c:3785
    #3 PyEval_EvalFrameEx (f=0x8929b0, throwflag=<value optimized out>) at Python/ceval.c:2548
    #4 0x000000000045e675 in PyEval_EvalCodeEx (co=0x7ffff7159e30, globals=<value optimized out>, locals=<value optimized out>, args=0x0, argcount=1, kws=<value optimized out>,
    kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:3198
    #5 0x000000000045e77b in PyEval_EvalCode (co=0x7ffff7e4d3b0, globals=0x7ffff70fd410, locals=0x0) at Python/ceval.c:668
    #6 0x00000000004800b2 in run_mod (fp=<value optimized out>, filename=<value optimized out>, flags=0x7fffffffe390) at Python/pythonrun.c:1711
    #7 PyRun_InteractiveOneFlags (fp=<value optimized out>, filename=<value optimized out>, flags=0x7fffffffe390) at Python/pythonrun.c:1104
    #8 0x00000000004803ce in PyRun_InteractiveLoopFlags (fp=0x7ffff75346a0, filename=0x5312a1 "<stdin>", flags=0x7fffffffe390) at Python/pythonrun.c:1006
    #9 0x0000000000480bab in PyRun_AnyFileExFlags (fp=0x7ffff75346a0, filename=0x5312a1 "<stdin>", closeit=0, flags=0x7fffffffe390) at Python/pythonrun.c:975
    #10 0x0000000000496422 in Py_Main (argc=<value optimized out>, argv=<value optimized out>) at Modules/main.c:607
    #11 0x0000000000416e6e in main (argc=<value optimized out>, argv=<value optimized out>) at ./Modules/python.c:152

    @serhiy-storchaka
    Copy link
    Member Author

    I thought it was one error, and not two.

    The updated patch adds tests and fixes minor mistake. 2.7 is not
    affected by main security issue, but it contains one of mentioned bugs
    (read 1 byte outside of the input array). A patch for 2.7 fixes this bug
    and also includes tests.

    @loewis loewis mannequin changed the title Vulnerability in the utf-16 decoder after error handling CVE-2012-2135: Vulnerability in the utf-16 decoder after error handling Apr 25, 2012
    @vstinner
    Copy link
    Member

    I ran tests of utf16_error_handling-3.2_4.patch on Python 3.1. Two tests are failing:

    • b'\x00\xd8'.decode('utf-16le', 'replace')='\ufffd\ufffd' != '\ufffd'
    • b'\xd8\x00'.decode('utf-16be', 'replace')='\ufffd\ufffd' != '\ufffd'

    I don't think that the test is correct: UTF-16 should resynchronize as early as possible (ignore the first invalid byte and restart at the following byte), so '\ufffd\ufffd' is the correct answer.

    Another examples:

    • b'\xd8\x00\x41'.decode('utf-16be', 'replace') should return '�A' (\ufffdA')
    • with UTF-8 decoder: (b'\xC3' + '\xe9'.encode('utf-8')).decode('utf-8', 'replace') returns '\ufffd\xe9'

    @pitrou
    Copy link
    Member

    pitrou commented Apr 26, 2012

    I ran tests of utf16_error_handling-3.2_4.patch on Python 3.1. Two tests are failing:

    • b'\x00\xd8'.decode('utf-16le', 'replace')='\ufffd\ufffd' != '\ufffd'
    • b'\xd8\x00'.decode('utf-16be', 'replace')='\ufffd\ufffd' != '\ufffd'

    I don't think that the test is correct: UTF-16 should resynchronize as
    early as possible (ignore the first invalid byte and restart at the
    following byte), so '\ufffd\ufffd' is the correct answer.

    UTF-16 units are 16-bit words, not bytes, so '\uffffd' sounds correct to
    me. You resynchronize on the word boundary: the invalid word is skipped.

    • with UTF-8 decoder: (b'\xC3' +
      '\xe9'.encode('utf-8')).decode('utf-8', 'replace') returns '\ufffd
      \xe9'

    That's because UTF-8 operates on bytes: the invalid byte is skipped.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Apr 26, 2012

    UTF-16 units are 16-bit words, not bytes, so '\uffffd' sounds correct to
    me. You resynchronize on the word boundary: the invalid word is skipped.

    I agree. The only odd case is when the number of bytes is not even
    (pun intended). In that case, anybody can guess which of the bytes is
    extra. The most natural (IMO) assumption is that the data is truncated,
    so it would be the last byte which is extra.

    @serhiy-storchaka
    Copy link
    Member Author

    Please, can anyone do a final review and commit?

    Here are three patches for three Python versions:

    2.7: utf16_error_handling-2.7.patch. Fix for one minor bug (overreading) and tests.

    3.2: utf16_error_handling-3.2_4.patch. Fix for one critical security bug (CVE-2012-2135) and several minor bugs, tests.

    3.3: utf16_error_handling-3.3.patch. Only tests.

    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode labels Jul 18, 2012
    @pitrou
    Copy link
    Member

    pitrou commented Jul 20, 2012

    There are spurious print() calls in the 2.7 patch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 20, 2012

    New changeset 034ff986019d by Antoine Pitrou in branch '3.2':
    Issue bpo-14579: Fix CVE-2012-2135: vulnerability in the utf-16 decoder after error handling.
    http://hg.python.org/cpython/rev/034ff986019d

    New changeset 118fe0ee6921 by Antoine Pitrou in branch 'default':
    Port additional tests from bpo-14579 (the issue is already fixed).
    http://hg.python.org/cpython/rev/118fe0ee6921

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jul 20, 2012

    New changeset 4cadf91aaddd by Antoine Pitrou in branch '2.7':
    Issue bpo-14579: Fix error handling bug in the utf-16 decoder.
    http://hg.python.org/cpython/rev/4cadf91aaddd

    @pitrou
    Copy link
    Member

    pitrou commented Jul 20, 2012

    Thanks for the patches, Serhiy! They're now pushed.

    @pitrou pitrou closed this as completed Jul 20, 2012
    @serhiy-storchaka
    Copy link
    Member Author

    There are spurious print() calls in the 2.7 patch.

    Oh, my inattentiveness. Thank you for pushing, Antoine. And thank Martin for
    review.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants