Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsigned Integer Overflow in sre_lib.h #68754

Closed
bee13oy mannequin opened this issue Jul 5, 2015 · 10 comments
Closed

Unsigned Integer Overflow in sre_lib.h #68754

bee13oy mannequin opened this issue Jul 5, 2015 · 10 comments
Assignees
Labels
topic-regex type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@bee13oy
Copy link
Mannequin

bee13oy mannequin commented Jul 5, 2015

BPO 24566
Nosy @vstinner, @ezio-melotti, @serhiy-storchaka
Superseder
  • bpo-18684: Pointers point out of array bound in _sre.c
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-07-06.10:52:21.757>
    created_at = <Date 2015-07-05.02:49:21.905>
    labels = ['expert-regex', 'type-crash']
    title = 'Unsigned Integer Overflow in sre_lib.h'
    updated_at = <Date 2015-07-07.13:34:34.611>
    user = 'https://bugs.python.org/bee13oy'

    bugs.python.org fields:

    activity = <Date 2015-07-07.13:34:34.611>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-07-06.10:52:21.757>
    closer = 'serhiy.storchaka'
    components = ['Regular Expressions']
    creation = <Date 2015-07-05.02:49:21.905>
    creator = 'bee13oy'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 24566
    keywords = []
    message_count = 10.0
    messages = ['246290', '246294', '246313', '246314', '246325', '246339', '246347', '246354', '246413', '246415']
    nosy_count = 6.0
    nosy_names = ['vstinner', 'ezio.melotti', 'mrabarnett', 'BreamoreBoy', 'serhiy.storchaka', 'bee13oy']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '18684'
    type = 'crash'
    url = 'https://bugs.python.org/issue24566'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6']

    @bee13oy
    Copy link
    Mannequin Author

    bee13oy mannequin commented Jul 5, 2015

    I found an Unsigned Integer Overflow in sre_lib.h.

    Tested on En Windows 7 x86 + Python 3.4.3 / Python 3.5.0b2

    Crash:
    ------
    (1a84.16b0): Access violation - code c0000005 (!!! second chance !!!)
    eax=00000002 ebx=0038f40c ecx=00000002 edx=0526cbb8 esi=83e0116b edi=c3e011eb
    eip=58bcfa53 esp=0038f384 ebp=0038f394 iopl=0 nv up ei ng nz na po cy
    cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010283
    python35+0x1fa53:
    58bcfa53 380e cmp byte ptr [esi],cl ds:002b:83e0116b=??

    code:
    ------
    58bcfa3d 8b4a04 mov ecx,dword ptr [edx+4]
    58bcfa40 0fb6c1 movzx eax,cl
    58bcfa43 3bc1 cmp eax,ecx
    58bcfa45 0f8593000000 jne python35+0x1fade (58bcfade)
    58bcfa4b 3bf7 cmp esi,edi
    58bcfa4d 0f838b000000 jae python35+0x1fade (58bcfade)
    58bcfa53 380e cmp byte ptr [esi],cl ds:002b:83e0116b=??
    58bcfa55 0f8583000000 jne python35+0x1fade (58bcfade)

    stack:
    ------
    0:000> kb
    ChildEBP RetAddr Args to Child
    WARNING: Stack unwind information not available. Following frames may be wrong.
    0038f394 58bcfedf 40000080 0038f40c 83e0116c python35+0x1fa53
    0038f3c0 58bd0f58 0000000 06016508 0526cb60 python35+0x1fedf
    0038f400 58bd5039 58e40c58 83e0116b 03e01158 python35+0x20f58
    0038f480 58bd76b2 0000000 7fffffff 0000000 python35+0x25039
    0038f4a4 58c925cf 0526cb60 0528a4d0 0000000 python35+0x276b2
    0038f4c4 58cf3633 06016508 0528a4d0 0000000 python35!PyCFunction_Call+0x2f
    0038f4f 58cf0b05 05840f90 03e0ab90 00000001 python35!PyEval_GetFuncDesc+0x373
    0038f570 58cf379 03e0ab90 0000000 00000001 python35!PyEval_EvalFrameEx+0x22d5
    0038f594 58cf3692 00000001 00000001 0000000 python35!PyEval_GetFuncDesc+0x4d1
    0038f5c8 58cf0b05 03e08de0 0012e850 0000000 python35!PyEval_GetFuncDesc+0x3d2
    0038f640 58cf25bb 0012e850 0000000 065feff0 python35!PyEval_EvalFrameEx+0x22d5
    0038f68c 58d29302 03dcfaa8 0000000 0000000 python35!PyEval_EvalFrameEx+0x3d8b
    0038f6c8 58d29195 03dcfaa8 03dcfaa8 0038f790 python35!PyRun_FileExFlags+0x1f2
    0038f6f4 58d2820a 05994fc8 052525a8 00000101 python35!PyRun_FileExFlags+0x85
    0038f738 58bfe9f7 05994fc8 052525a8 00000001 python35!PyRun_SimpleFileExFlags+0x20a
    0038f764 58bff32b 0038f790 5987b648 5987cc94 python35!Py_hashtable_copy+0x5e17
    0038f808 1c6f11df 00000003 05796f70 05210f50 python35!Py_Main+0x90b

    source code:

    LOCAL(Py_ssize_t)
    SRE(search)(SRE_STATE* state, SRE_CODE* pattern)
    {
        SRE_CHAR* ptr = (SRE_CHAR *)state->start;
        SRE_CHAR* end = (SRE_CHAR *)state->end;
        Py_ssize_t status = 0;
        Py_ssize_t prefix_len = 0;
        Py_ssize_t prefix_skip = 0;
        SRE_CODE* prefix = NULL;
        SRE_CODE* charset = NULL;
        SRE_CODE* overlap = NULL;
        int flags = 0;
    
        if (pattern[0] == SRE_OP_INFO) {
            /* optimization info block */
            /* <INFO> <1=skip> <2=flags> <3=min> <4=max> <5=prefix info>  */
            flags = pattern[2];
            if (pattern[3] > 1) {
                /* adjust end point (but make sure we leave at least one
                   character in there, so literal search will work) */
                end -= pattern[3] - 1;
                if (end <= ptr)
                    end = ptr;
            }
    		...
    	}
    	
    	...
    	
    	} else
    		/* general case */
    		while (ptr <= end) {
    			TRACE(("|%p|%p|SEARCH\n", pattern, ptr));
    			state->start = state->ptr = ptr++;
    			status = SRE(match)(state, pattern, 0);
    			if (status != 0)
    				break;
        }
    }
    
    SRE(count)(SRE_STATE* state, SRE_CODE* pattern, Py_ssize_t maxcount)
    {
        SRE_CODE chr;
        SRE_CHAR c;
        SRE_CHAR* ptr = (SRE_CHAR *)state->ptr;
        SRE_CHAR* end = (SRE_CHAR *)state->end;
        Py_ssize_t i;
    
        /* adjust end */
        if (maxcount < end - ptr && maxcount != SRE_MAXREPEAT)
            end = ptr + maxcount;
    		
        ...
    	
    #if SIZEOF_SRE_CHAR < 4
            if ((SRE_CODE) c != chr)
                ; /* literal can't match: doesn't fit in char width */
            else
    #endif
            while (ptr < end && *ptr == c) // crash here, ptr points to an unreadable memory.
                ptr++;
            break;
    }

    poc code:
    ---cut----

    import re

    pattern = "([\\2]{1073741952})"
    regexp = re.compile(r''+pattern+'')
    sgroup = regexp.search(pattern)

    ---cut---

    1.) In SRE(search), pattern[3] is equal to 1073741952 (0x400000080). What's more, the program doesn't limit the max size, which causes the end pointer is pointed to an invalid and large address( bigger than ptr).
    2.) Then program run while (ptr <= end) { state->start = state->ptr = ptr++,..} , but state->end pointer is the orignal value.3.) After a while's running, it comes to SRE(count) and adjust the end, end - ptr = 0x7fffffff, which is largger than 0x400000080, ptr has been pointed to an invalid address.
    3.) After a while, it runs to function SRE(count) and adjust the end, end - ptr = 0x7fffffff, which is largger than 0x400000080, ptr has been pointed to an invalid address.

    @bee13oy bee13oy mannequin added the topic-regex label Jul 5, 2015
    @ezio-melotti ezio-melotti added the type-crash A hard crash of the interpreter, possibly with a core dump label Jul 5, 2015
    @serhiy-storchaka
    Copy link
    Member

    Does the patch for bpo-18684 fix this issue?

    @bee13oy
    Copy link
    Mannequin Author

    bee13oy mannequin commented Jul 5, 2015

    I didn't test that path, I just found this bug in python3.4.3 by fuzzing re module, and tested Python 3.5.0b2 on windows 7 x86, It has the same problem.

    @bee13oy
    Copy link
    Mannequin Author

    bee13oy mannequin commented Jul 5, 2015

    I have just tested python 2.7.10 on Windows 7 x86 with the poc code, it will also result in python crash.

    @serhiy-storchaka
    Copy link
    Member

    Not having Windows I can't reproduce the crash. Someone should test if the patch for bpo-18684 fixes this issue and doesn't introduce other regressions.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 5, 2015

    Fixed by the patch on bpo-18684, see also my comments there.

    @bee13oy
    Copy link
    Mannequin Author

    bee13oy mannequin commented Jul 6, 2015

    I tested this path, and It really fixed this issue. But I'm wondering Python 2.7.10 was released at May 23, 2015, and this path was created at March 22,2015. So does it mean, Python 2.7.10/3.5.0b2 was compiled and released without applying this path?

    @serhiy-storchaka
    Copy link
    Member

    Yes, this patch was not applied because it had no visible effect on Linux. Now, with your report, there is a case on Windows.

    @bee13oy
    Copy link
    Mannequin Author

    bee13oy mannequin commented Jul 7, 2015

    Thank you. I got it.

    2015-07-06 18:53 GMT+08:00 Serhiy Storchaka <report@bugs.python.org>:

    Serhiy Storchaka added the comment:

    Yes, this patch was not applied because it had no visible effect on Linux.
    Now, with your report, there is a case on Windows.

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue24566\>


    @serhiy-storchaka
    Copy link
    Member

    Thank you for your report. Without your example the patch would postponed indefinitely.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-regex type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants