Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time_strftime() Buffer Over-read #69105

Closed
JohnLeitch mannequin opened this issue Aug 22, 2015 · 67 comments
Closed

time_strftime() Buffer Over-read #69105

JohnLeitch mannequin opened this issue Aug 22, 2015 · 67 comments
Assignees
Labels
deferred-blocker extension-modules C modules in the Modules dir OS-windows type-security A security issue

Comments

@JohnLeitch
Copy link
Mannequin

JohnLeitch mannequin commented Aug 22, 2015

BPO 24917
Nosy @malemburg, @birkenfeld, @pfmoore, @abalkin, @vstinner, @larryhastings, @tjguk, @zware, @eryksun, @zooba
Superseder
  • bpo-25029: MemoryError in test_strptime
  • Files
  • time_strftime_Buffer_Over-read.patch: patch
  • time_strftime_Buffer_Over-read.py: repro
  • time_strftime_Buffer_Over-read_v2.patch: patch
  • time_strftime_Buffer_Over-read_v3.patch
  • 24917_4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/zooba'
    closed_at = <Date 2015-09-08.15:15:28.839>
    created_at = <Date 2015-08-22.23:39:19.706>
    labels = ['type-security', 'extension-modules', 'deferred-blocker', 'OS-windows']
    title = 'time_strftime() Buffer Over-read'
    updated_at = <Date 2015-09-08.15:15:28.839>
    user = 'https://bugs.python.org/JohnLeitch'

    bugs.python.org fields:

    activity = <Date 2015-09-08.15:15:28.839>
    actor = 'steve.dower'
    assignee = 'steve.dower'
    closed = True
    closed_date = <Date 2015-09-08.15:15:28.839>
    closer = 'steve.dower'
    components = ['Extension Modules', 'Windows']
    creation = <Date 2015-08-22.23:39:19.706>
    creator = 'JohnLeitch'
    dependencies = []
    files = ['40228', '40229', '40367', '40368', '40383']
    hgrepos = []
    issue_num = 24917
    keywords = ['patch']
    message_count = 67.0
    messages = ['248998', '249800', '249801', '249805', '249806', '249857', '249860', '249861', '249862', '249863', '249864', '249867', '249868', '249870', '249872', '249874', '249876', '249877', '249878', '249881', '249883', '249884', '249899', '249904', '249905', '249908', '249919', '249920', '249950', '249956', '249958', '249960', '249961', '249962', '249963', '249964', '249965', '249966', '249967', '249969', '249970', '249971', '249972', '249973', '249974', '249975', '249976', '249977', '249994', '250014', '250020', '250022', '250023', '250024', '250030', '250031', '250035', '250039', '250040', '250041', '250042', '250062', '250064', '250071', '250202', '250203', '250213']
    nosy_count = 14.0
    nosy_names = ['lemburg', 'georg.brandl', 'paul.moore', 'belopolsky', 'vstinner', 'larry', 'tim.golden', 'BreamoreBoy', 'python-dev', 'zach.ware', 'eryksun', 'steve.dower', 'JohnLeitch', 'brycedarling']
    pr_nums = []
    priority = 'deferred blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = '25029'
    type = 'security'
    url = 'https://bugs.python.org/issue24917'
    versions = ['Python 3.4', 'Python 3.5']

    @JohnLeitch
    Copy link
    Mannequin Author

    JohnLeitch mannequin commented Aug 22, 2015

    Python 3.5 suffers from a vulnerability caused by the behavior of the time_strftime() function. When called, the function loops over the format string provided, using strchr to search for each instance of '%'. After finding a '%', it continues to search two characters ahead, assuming that each instance is the beginning of a well formed format string token.

    However, if a string ends with '%', this logic will result in a call to strchr that reads off the end of the format string buffer:

        /* check that the format string contains only valid directives */
        for(outbuf = strchr(fmt, '%'); <<<< Assuming fmt ends with a '%', this will return a pointer to the end of the string.
            outbuf != NULL;
            outbuf = strchr(outbuf+2, '%')) <<<< Once outbuf is pointing to the end of the string, outbuf+2 skips
        {                                        past the null terimnator, leading to a buffer over-read.
            if (outbuf[1]=='#')
                ++outbuf; /* not documented by python, */
            if ((outbuf[1] == 'y') && buf.tm_year < 0)
            {
                PyErr_SetString(PyExc_ValueError,
                            "format %y requires year >= 1900 on Windows");
                Py_DECREF(format);
                return NULL;
            }
        }

    In some applications, it may be possible to exploit this behavior to disclose the contents of adjacent memory. The buffer over-read can be observed by running the following script:

    from time import *
    strftime("AA%"*0x10000)

    Which, depending on the arrangement of memory, may produce an exception such as this:

    0:000> g
    (20b8.18d4): Access violation - code c0000005 (first chance)
    First chance exceptions are reported before any exception handling.
    This exception may be expected and handled.
    eax=ffffffff ebx=52c1a6a0 ecx=00000000 edx=08ef3000 esi=08ec2fe8 edi=08ec2ff8
    eip=52d254f3 esp=004cf9d4 ebp=004cfa58 iopl=0 nv up ei pl nz na pe nc
    cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206
    python35!strchr+0x33:
    52d254f3 f30f6f0a movdqu xmm1,xmmword ptr [edx] ds:002b:08ef3000=????????????????????????????????
    0:000> db edx-0x10
    08ef2ff0 41 25 41 41 25 41 41 25-00 d0 d0 d0 d0 d0 d0 d0 A%AA%AA%........
    08ef3000 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    08ef3010 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    08ef3020 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    08ef3030 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    08ef3040 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    08ef3050 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    08ef3060 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    0:000> k5
    ChildEBP RetAddr
    004cf9d0 52c1a7f python35!strchr+0x33 [f:\dd\vctools\crt\vcruntime\src\string\i386\strchr_sse.inc @ 75]
    004cfa58 52c832d3 python35!time_strftime+0x156 [c:\build\cpython\modules\timemodule.c @ 615]
    004cfa74 52ce442f python35!PyCFunction_Call+0x113 [c:\build\cpython\objects\methodobject.c @ 109]
    004cfaa8 52ce18ec python35!call_function+0x2ff [c:\build\cpython\python\ceval.c @ 4651]
    004cfb20 52ce339f python35!PyEval_EvalFrameEx+0x232c [c:\build\cpython\python\ceval.c @ 3184]

    To fix this issue, it is recommended that time_strftime() be updated to check outputbuf[1] for null in the body of the format string directive validation loop. A proposed patch is attached.

    Credit: John Leitch (johnleitch@outlook.com), Bryce Darling (darlingbryce@gmail.com)

    @JohnLeitch JohnLeitch mannequin added the type-security A security issue label Aug 22, 2015
    @abalkin
    Copy link
    Member

    abalkin commented Sep 4, 2015

    Is there a CERT report associated with this vulnerability?

    @JohnLeitch
    Copy link
    Mannequin Author

    JohnLeitch mannequin commented Sep 4, 2015

    Currently, no. Would you like us to report this and future vulnerabilities to CERT?

    @abalkin
    Copy link
    Member

    abalkin commented Sep 4, 2015

    This is happening in windows-only code, so not being a windows user I cannot reproduce it. It does not look like something new. This code was last touched in bpo-10653.

    Can someone confirm that only 3.5 is affected?

    @abalkin
    Copy link
    Member

    abalkin commented Sep 4, 2015

    Would you like us to report this and future vulnerabilities to CERT?

    If it affects released versions, I think this is the right thing to do. Note that I don't know what PSF policy is on this is or whether they even have one.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Sep 4, 2015

    I have tried the reproducer on Windows 10 with 2.6, 2.7, 3.3, 3.4, 3.5 and 3.6. In every case I got this.

    C:\py -3.4 -c"from time import *;strftime('AA%'*0x10000)"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ValueError: Invalid format string

    @abalkin
    Copy link
    Member

    abalkin commented Sep 4, 2015

    In every case I got [ValueError]

    You shouldn't have. I am no Windows expert, but I suspect something is wrong with your use of the command line. Please try it at the Python prompt or put the code in a script.

    @JohnLeitch
    Copy link
    Mannequin Author

    JohnLeitch mannequin commented Sep 4, 2015

    I have tried the reproducer on Windows 10 with 2.6, 2.7, 3.3, 3.4, 3.5 and 3.6. In every case I got this.

    What you are observing is due to the arrangement and contents of process memory. With a simple repro (such as the one provided), there's a good chance the null terminator of the format string will be followed by more null bytes, and thus the code will appear to work as intended. In more complex scripts where memory is ultimately reused, it's more likely that the null terminator will be followed by garbage, non-null bytes.

    To make the issue reproduce more reliably, use GFlags to enable heap tail checking, heap free checking, and page heap.

    https://msdn.microsoft.com/en-us/library/windows/hardware/ff549557(v=vs.85).aspx

    Then, when you repro the issue, you'll see the crash because the uninitialized memory will contain the fill pattern 0xd0 rather than 0x00, like this:

    0:000> db edx-0x10
    08ef2ff0 41 25 41 41 25 41 41 25-00 d0 d0 d0 d0 d0 d0 d0 A%AA%AA%........
    08ef3000 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    08ef3010 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    08ef3020 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    08ef3030 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    08ef3040 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    08ef3050 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    08ef3060 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????

    To be clear, heap verification is not a requirement--the bug can indeed be reproduced without it. However, it will make life easier by introducing more determinism.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Sep 4, 2015

    Python prompt or script makes no difference.

    C:\Users\Mark\Desktop>py -2.6 time_strftime_Buffer_Over-read.py
    Traceback (most recent call last):
      File "time_strftime_Buffer_Over-read.py", line 2, in <module>
        strftime("AA%"*0x10000)
    ValueError: Invalid format string
    Python 3.5.0rc2 (v3.5.0rc2:cc15d736d860, Aug 25 2015, 05:08:37) [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from time import *
    >>> strftime("AA%"*0x10000)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: Invalid format string

    @abalkin
    Copy link
    Member

    abalkin commented Sep 4, 2015

    Sorry for a bad guess. John's advise makes much more sense than mine.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Sep 4, 2015

    Okay, so presumably this applies to all supported versions and not just 3.5? Can you also remove the reproducer and supply specific instructions on how to build the code so that the problem can be reproduced.

    @BreamoreBoy BreamoreBoy mannequin added the OS-windows label Sep 4, 2015
    @JohnLeitch
    Copy link
    Mannequin Author

    JohnLeitch mannequin commented Sep 4, 2015

    It very well may apply to versions apart from 3.5. Our test environment is quite complex and unfriendly to working with multiple versions of Python. Plus, we're strapped for time, so we tend to file under the version we're currently targeting and defer to those better equipped and more familiar with the codebase.

    The repro file is fine and there's no need to rebuild anything; the heap verification I'm suggesting is applied to binaries, not source. You can find information here:

    https://msdn.microsoft.com/en-us/library/windows/hardware/ff549557(v=vs.85).aspx

    Install the debugging tools, run GFlags, select the python.exe/pythonw.exe image file, and apply the settings I mentioned in my previous post.

    @abalkin
    Copy link
    Member

    abalkin commented Sep 4, 2015

    John,

    There is no doubt that this is a bona fide bug. I was just hoping that someone with a Windows development machine would help figuring out the affected versions.

    From the hg history, it looks like the faulty code is present in all versions starting from 3.2:

    https://hg.python.org/cpython/annotate/977c5753ca32/Modules/timemodule.c#l515

    I will set the issue versions accordingly.

    @abalkin
    Copy link
    Member

    abalkin commented Sep 5, 2015

    I will keep the "security" classification, but will not increase the priority. Accepting format strings from untrusted sources is a vulnerability in and by itself and in most cases those strings are literals.

    @abalkin
    Copy link
    Member

    abalkin commented Sep 5, 2015

    FWIW, the patch looks good to me, but it needs to be reviewed by a Windows developer.

    @abalkin abalkin added the extension-modules C modules in the Modules dir label Sep 5, 2015
    @JohnLeitch
    Copy link
    Mannequin Author

    JohnLeitch mannequin commented Sep 5, 2015

    When I get a bit of slackspace (probably tomorrow afternoon/evening) I can test on the spectrum of versions to confirm the issue is in >= 3.2. I'll also look into improving our automation so all future reports can have the appropriate versions flagged.

    Regarding untrusted format strings, I believe you are mistaken. In native applications, untrusted format strings are problematic because an attacker can use injected tokens to read/write arbitrary memory, which can be leveraged to attain code execution.

    However, in the context of Python, a format string with too many tokens should be handled safely, resulting in a Python exception rather than exploitable memory corruption. This is the behavior observed in format string handling throughout Python (and indeed most managed/scripting languages). Yes, in most Python programs format strings will be constants, and using dynamically constructed format strings may be considered a bad practice. But, should a developer choose to pass a dynamically constructed string (for example, functionality that allows untrusted users to specify custom time formatting), it's not unreasonable for them to expect memory safety to be maintained.

    Of course, if there's a risk I'm overlooking I'd like to better understand it, and the relevant Python documentation should be updated.

    @abalkin
    Copy link
    Member

    abalkin commented Sep 5, 2015

    As far as I know, the practical consequence of "security" classification for the issue is how many affected older versions will be patched. I am keeping that and the 3.2 - 3.5 versions range.

    The priority may affect whether this will make it into 3.5.0. I'll defer this decision to the release managers.

    If you want an authoritative answer on the impact of this issue - report it to CERT or FIRST and get them compute the CVSS score.

    @larryhastings
    Copy link
    Contributor

    Yes, I'd like this fix in 3.5.0.

    One bit of feedback on the patch: outbuf is a char * (or wchar_t *), therefore outbuf[1] is a char (or wchar_t). You shouldn't compare it to NULL. I'm not sure we still support any compilers that define NULL as (void *)0, but we might. Anyway, please instead compare to '\0'.

    @abalkin
    Copy link
    Member

    abalkin commented Sep 5, 2015

    if there's a risk I'm overlooking I'd like to better understand it,
    and the relevant Python documentation should be updated.

    I don't think there is any special risk that you are overlooking other than a documented fact that Python's strftime is a thin layer on top of system strftime and these are notoriously buggy on many systems.

    A python application that accepts custom formats from users should limit those formats to a set that is known to work on the targeted platforms. Relying on strftime to properly return an error code and not do anything nasty is probably not a good idea.

    This said, I express no opinion on the severity of this bug.

    @JohnLeitch
    Copy link
    Mannequin Author

    JohnLeitch mannequin commented Sep 5, 2015

    Attached is a revised patch.

    @abalkin
    Copy link
    Member

    abalkin commented Sep 5, 2015

    Hmm, on Mac OSX "%" and "A%" are valid format strings:

    >>> time.strftime("%")
    '%'
    >>> time.strftime("A%")
    'A%'

    Mark's experiments show that on Windows they are not. What about the other platforms affected by the patch? I am concerned about the bottom part of the patch.

    A nitpick: an existing error message is "Invalid format string". I would either reuse it or make it "Incomplete format string".

    @JohnLeitch
    Copy link
    Mannequin Author

    JohnLeitch mannequin commented Sep 5, 2015

    I plucked the error message from the % operator:

    >>> '%' % 'foo'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: incomplete format
    >>> '%z' % 'foo'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: unsupported format character 'z' (0x7a) at index 1

    That said, it should be more consistent. I went with the latter suggestion since it's more informative. An updated patch is attached.

    As for platform compatibility, I'd certainly feel better if someone could test the Sun/AIX changes. Unfortunately, I'm unable to do so. It's interesting that OSX accepts "%". As per the spec, "%%" is a "%" literal, and the behavior of invalid tokens such as "%" is undefined.

    @malemburg
    Copy link
    Member

    On 05.09.2015 03:49, Alexander Belopolsky wrote:
    > 
    > Alexander Belopolsky added the comment:
    > 
    > Hmm, on Mac OSX "%" and "A%" are valid format strings:
    > 
    >>>> time.strftime("%")
    > '%'
    >>>> time.strftime("A%")
    > 'A%'
    > 
    > Mark's experiments show that on Windows they are not. What about the other platforms affected by the patch?  I am concerned about the bottom part of the patch.

    Trailing '%' are valid on Linux, just as using unsupported format
    codes (those as passed through as is).

    On Windows, the C lib strftime() segfaults with a trailing '%', just
    as it does with unsupported format codes.

    I have this code in mxDateTime to check for this:

    static
    int _mxDateTime_CheckWindowsStrftime(char *fmt,
    				     struct tm *tm)
    {
        char *p;
    /* Range checks \*/
    Py_Assert(tm-\>tm_sec \< 60,
          PyExc_ValueError,
          ".strftime() cannot format leap seconds on Windows");
    Py_Assert(tm-\>tm_mday \<= 31,
          PyExc_ValueError,
          ".strftime() cannot format days \> 31 on Windows");
    
        /* Scan format string for invalid codes */
        for (p = fmt; *p != '\0'; p++) {
    	register char code;
    	if (*p != '%')
    	    continue;
    	code = *++p;
    	/* Check for supported format codes; see
    	   https://msdn.microsoft.com/en-us/library/fe06s4ak.aspx */
    	switch (code) {
    	case 'a':
    	case 'A':
    	case 'b':
    	case 'B':
    	case 'c':
    	case 'd':
    	case 'H':
    	case 'I':
    	case 'i':
    	case 'm':
    	case 'M':
    	case 'p':
    	case 'S':
    	case 'U':
    	case 'w':
    	case 'W':
    	case 'x':
    	case 'X':
    	case 'y':
    	case 'Y':
    	case 'z':
    	case 'Z':
    	case '%':
    	    continue;
    	case '\0':
    	    Py_Error(PyExc_ValueError,
    		     "format string may not end with a '%' character "
    		     "on Windows");
    	default:
    	    Py_ErrorWithArg(PyExc_ValueError,
    			    "format code '%c' not supported on Windows",
    			    code);
    	}
        }
        return 0;
    
      onError:
        return -1;
    }

    @eryksun
    Copy link
    Contributor

    eryksun commented Sep 5, 2015

    On Windows, the C lib strftime() segfaults with a trailing '%',
    just as it does with unsupported format codes.

    No, it doesn't segfault; it invokes the invalid parameter handler (IPH). This could always be configured for the whole process, but with VC 14 it can also be configured per thread. I think it was Steve Dower who updated time_strftime to take advantage of this feature in 3.5, using the new macros _Py_BEGIN_SUPPRESS_IPH and _Py_END_SUPPRESS_IPH. This allowed removing the following check that used to be in the code prior to 3.5:

            if (outbuf[1]=='\0' ||
                !strchr("aAbBcdHIjmMpSUwWxXyYzZ%", outbuf[1]))
            {
                PyErr_SetString(PyExc_ValueError, "Invalid format string");
                Py_DECREF(format);
                return NULL;
            }

    With this change the time module no longer has to make assumptions about what the CRT considers to be a valid format string in order to avoid invoking the IPH. However, this also accidentally removed checking whether outbuf[1]=='\0'. Now, instead of calling the default IPH that terminates the process, Python's _silent_invalid_parameter_handler gets called, which of course does nothing by design:

        >>> import time
        >>> time.strftime('A%')
        Breakpoint 0 hit
        python35!_silent_invalid_parameter_handler:
        00000000`5eadae50 c20000          ret     0
        0:000> k8
        Child-SP          RetAddr           Call Site
        00000000`002af018 000007fe`e9d8d2ab python35!_silent_invalid_parameter_handler
        00000000`002af020 000007fe`e9d8d2c9 ucrtbase!invalid_parameter+0x103
        00000000`002af060 000007fe`e9dafedc ucrtbase!invalid_parameter_noinfo+0x9
        00000000`002af0a0 000007fe`e9dac359 ucrtbase!Wcsftime_l+0x168
        00000000`002af140 000007fe`e9dac3f5 ucrtbase!Strftime_l+0x155
        00000000`002af1d0 00000000`5e9fc785 ucrtbase!strftime+0x15
        00000000`002af210 00000000`5ea7d5c2 python35!time_strftime+0x1f5
        00000000`002af2b0 00000000`5eaf8fd0 python35!PyCFunction_Call+0x122

    @larryhastings
    Copy link
    Contributor

    Rather than debating about how various platforms handle malformed format strings for strftime(), and whether or not they crash, we should simply prevent the native strftime() function from seeing them in the first place.

    I'd like the "v3" patch in 3.5.0rc3. Can someone create a pull request?

    @eryksun
    Copy link
    Contributor

    eryksun commented Sep 5, 2015

    Rather than debating about how various platforms handle malformed
    format strings for strftime(), and whether or not they crash, we
    should simply prevent the native strftime() function from seeing
    them in the first place.

    Of course the check needs to be restored. I wasn't suggesting to go back to the default IPH to have it terminate the process. That won't help, anyway. The potential over-read is in the for loop, before calling strftime. I just thought the context for the accidental removal was relevant. Care needs to be taken when removing checks that were put in place to avoid the default IPH, because those checks may serve a dual purpose in the surrounding code.

    The patch also fixes the AIX/Sun code, which needs to be applied to 3.4 as well. I don't think this issue applies to 3.2 and 3.3.

    @zooba
    Copy link
    Member

    zooba commented Sep 5, 2015

    I'll do the PR.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 6, 2015

    New changeset 73227e857d6b by Steve Dower in branch '3.5':
    Issue bpo-24917: Backed out changeset 09b62202d9b7
    https://hg.python.org/cpython/rev/73227e857d6b

    New changeset 743924064dc8 by Steve Dower in branch 'default':
    Issue bpo-24917: Backed out changeset 09b62202d9b7
    https://hg.python.org/cpython/rev/743924064dc8

    @zooba
    Copy link
    Member

    zooba commented Sep 6, 2015

    Yeah, I came in late and in too much of a rush too :(

    Backed out the broken changes, and now I'm off to bed. Not convinced there's anything that needs fixing here, and IIRC we argued through the different platform behaviours on the issue when I removed the null check that was preventing the overread.

    If we do anything now, it should just be if (outbuf[1] == '\0') break to avoid the overread. strftime() is probably too far gone to make it consistent across platforms now.

    @JohnLeitch
    Copy link
    Mannequin Author

    JohnLeitch mannequin commented Sep 6, 2015

    Yes, this is a user-mode read, but I disagree with the assertion that it's not possible to use this to disclose memory. While it isn't as critical as something that outright dumps memory, there is logic that throws exceptions based on values encountered while reading outside the bounds of the buffer. This could be used as a channel to infer what is or isn't in adjacent memory. That it's user-mode doesn't matter--if an application exposes the format string as attack surface, suddenly process memory can be probed. So, it's not heartbleed, but it does have security implications. If you'd like, I can take a shot at building a PoC.

    Further, it's best to err on the side of caution with bugs like these; just because it doesn't seem like major issue now doesn't mean someone won't come along in the future and prove otherwise.

    @larryhastings
    Copy link
    Contributor

    I'm going to back this out of 3.5.0rc3. I'm still willing to discuss accepting it into 3.5.0 final in some form, but for now I don't want to hold up the release.

    Steve: it should never be possible to crash the Python interpreter with well-formed Python code. Whether or not it's possible to exploit it, I do think this crash is something the interpreter should prevent.

    @eryksun
    Copy link
    Contributor

    eryksun commented Sep 6, 2015

    MSVC seems somewhat inconsistent about its response:
    >>> strftime('aaa%')
    ''

    That's not due to MSVC. It's setting errno to EINVAL. The problem is that time_strftime is testing (buflen > 0 || i >= 256 * fmtlen). The initial value of the outbuf size i is 1024, so when (fmtlen <= 4), the value of (256 * fmtlen) is less than or equal to i, and it assumes "the format yields an empty result" without considering the value of errno. So instead of raising an exception for EINVAL, it calls PyUnicode_DecodeLocaleAndSize to return an empty string:

        >>> strftime('aaa%')
        Breakpoint 1 hit
        ucrtbase!strftime:
        000007fe`e2bac3e0 4883ec38        sub     rsp,38h
        0:000> gu
        python35!time_strftime+0x1f5:
        00000000`5de9c785 488bcb          mov     rcx,rbx
        0:000> be 0; g
        Breakpoint 0 hit
        ucrtbase!_errno:
        000007fe`e2b341b0 48895c2408      mov     qword ptr [rsp+8],rbx ss:00000000`0028f320=ffffffffffffffff

    errno is 22:

    0:000> pt; dd @rax l1
    00000000`002ddb50  00000016
    0:000> bd 0; g
    Breakpoint 2 hit
    python35!PyUnicode_DecodeLocaleAndSize:
    00000000`5df55070 4053            push    rbx
    0:000> k 2
    Child-SP          RetAddr           Call Site
    00000000`0028f318 00000000`5de9c81a python35!PyUnicode_DecodeLocaleAndSize
    00000000`0028f320 00000000`5df1d5c2 python35!time_strftime+0x28a
    0:000> g
    ''
    

    @zooba
    Copy link
    Member

    zooba commented Sep 6, 2015

    Larry: agreed, no crashing. We'll get the trivial fix in that doesn't raise any error, and should probably stop suppressing the exception in the situation eryksun describes. That will correctly expose the platform's strftime semantics.

    Unless you want to write more thorough validation of incoming strings, I'll work something up later today or tomorrow.

    @zooba
    Copy link
    Member

    zooba commented Sep 6, 2015

    New patch attached that just breaks out of the scanning loop and lets the system CRT handle invalid format strings. Fixes the condition that was suppressing some errors on Windows.

    Also fixed the PEP-7 issues around the changed code (I believe). No new test because the crash cannot be reliably reproduced and the expected results vary dramatically between platforms.

    The two if/break lines in the AIX branch should also be applied to 3.4 to prevent crashes. The rest of the fix (including the PEP-7 fixes) should not.

    @JohnLeitch
    Copy link
    Mannequin Author

    JohnLeitch mannequin commented Sep 6, 2015

    First, let me begin by saying I believe this patch will fix the buffer over-read, which is a good step forward.

    However, after giving the matter more thought, and at the risk of wearing out my welcome, I am of the belief that relying on the CRT to handle malformed format strings is the wrong approach. As per the C spec, strftime's behavior when handling invalid format strings is undefined:

    "If a conversion specifier is not one of the above, the behavior is undefined"

    Quite often, "undefined" translates to "exploitable". And at the very least, by not performing thorough enough validation, Python is misusing strftime(), which may lead to crashes or undermine memory safety. Of course, this is all speculation--I haven't the time or resource to learn other platforms to see what's possible. But, even if I could, the task would be Sisyphean because there's simply no way to know what the future holds when dealing with implementation that could change at any point.

    I realize we must be pragmatic with matters such as this, and a dramatic change could be breaking for some Python apps. Even so, I feel it's worth vocalizing these concerns. As a principal, I think that "safe", well-formed Python should never be able to perform operations that lead to undefined behavior in the underlying runtime.

    Alright, rant done. If at any point in time locking down Python's strftime with more aggressive validation is considered viable, I am more than willing to take a shot at submitting a patch.

    @abalkin
    Copy link
    Member

    abalkin commented Sep 6, 2015

    Historically, time and os modules have been considered low level, thin wrappers over system libc functions. Users of these modules are the proverbial "consenting adults" who should understand their power and the associated risks.

    The place to provide a better behaved strftime function is the datetime module, but I would leave time.stftime pretty much the way it is and would only consider making *fewer* checks on Windows if _Py_BEGIN_SUPPRESS_IPH / _Py_END_SUPPRESS_IPH macros can indeed help to avoid crashes on illegal format strings.

    @zooba
    Copy link
    Member

    zooba commented Sep 6, 2015

    FWIW, those macros prevent a certain class of "undefined behavior," which in this case is to terminate the process rather than return an error code. They don't do anything to prevent crashes due to exploitation or malicious code.

    The place for more robust formatting is datetime.__format__(), which from a quick test seems to be more strict about what is in the format string.

    @larryhastings
    Copy link
    Contributor

    Having slept on it, I agree with Steve. We should make the minimal change necessary in order to not crash.

    However, it still needs a regression test. The test can use JohnLeitch's proposed test as a good starting point, but it must accept either success or ValueError as passing the test.

    I still think there's merit in detecting a trailing unquoted % at the end of the format string, but that would only be appropriate for 3.6 at this point.

    @eryksun
    Copy link
    Contributor

    eryksun commented Sep 6, 2015

    With MSVC, if errno is cleared before calling strftime, then when buflen == 0 you know whether it's an invalid format string (EINVAL) or maxsize is too small (ERANGE). There's no need to guess. Oddly, only EINVAL is documented, even though setting ERANGE has been in the implementation for years.

    VC 10 (strftime.c):

            /* error - return an empty string */
            *(strstart)='\0';
    
            /* now return our error/insufficient buffer indication */
            if ( !failed && left <= 0 )
            {
                /* do not report this as an error to allow the caller to resize */
                errno=ERANGE;
            }
            else
            {
                _VALIDATE_RETURN( FALSE, EINVAL, 0);
            }
    

    VC 14 (time/wcsftime.cpp):

        // Error:  return an empty string:
        *string = L'\0';
    
        // Now return our error/insufficient buffer indication:
        if (!failed && remaining <= 0)
        {
            // Do not report this as an error to allow the caller to resize:
            errno = ERANGE;
        }
        else
        {
            _VALIDATE_RETURN(false, EINVAL, 0);
        }
    

    @zooba
    Copy link
    Member

    zooba commented Sep 6, 2015

    How's this for the test (I added an explanatory comment, since it may look like it isn't testing anything to someone who isn't familiar with the issue):

        def test_strftime_format_check(self):
            # Test that strftime does not crash on invalid format strings
            # that may trigger a buffer overread. When not triggered,
            # strftime may succeed or raise ValueError depending on
            # the platform.
            for x in [ '', 'A', '%A', '%AA' ]:
                for y in range(0x0, 0x10):
                    for z in [ '%', 'A%', 'AA%', '%A%', 'A%A%', '%#' ]:
                        try:
                            time.strftime(x * y + z)
                        except ValueError:
                            pass

    @zooba
    Copy link
    Member

    zooba commented Sep 7, 2015

    @larryhastings
    Copy link
    Contributor

    Steve, did you confirm that the test triggers the array bounds bug when the patch *isn't* applied? I want to confirm both that a) the test exercises the bug, and b) the fix fixes the bug. I assume you ran the test suite with the patch applied, so that covers b).

    @zooba
    Copy link
    Member

    zooba commented Sep 7, 2015

    I wasn't able to repro the crash at all, even with the debugging flags that make this sort of issue more prominent. It relies on a very precise layout of multiple objects in memory, or possibly a specific sequence of allocations/deallocations, as well as a format string ending in an unescaped '%' or (on Windows) '%#'.

    It's still obviously an issue though - we should have the check for '\0' there by any reasonably analysis of the code, or else should not be adding 2 to the pointer to start the next step of the search.

    @larryhastings
    Copy link
    Contributor

    Okay, I think *I* reproduced it.

    1. I pulled your cpython350 fork down locally.

    2. I updated to your checkin that fixed the bug. (c31dad22c80d)

    3. I reverted the change to Modules/timemodule.c to put the bug back:
      % hg cat -r 97393 Modules/timemodule.c > Modules/timemodule.c

    4. I changed line 611 (or so) from "#if defined(MS_WINDOWS) && !defined(HAVE_WCSFTIME)" to "#if 1" so I'd get the code that had the bug.

    5. I ran "./configure --with-valgrind && make" to make it.

    6. I ran "valgrind ./python -m test test_time" and ***Valgrind complained about an array overrun***.

    7. I restored the bugfix to Modules/timemodule.c, then reinstated the change from 4) above.

    8. I ran make and valgrind again and didn't get the complaint about the array overrun.

    For grins I also tried enabling the other stanza of code that has the bug (the AIX / sun / have_wcsftime) and observed the same thing.

    Is that convincing?

    @zooba
    Copy link
    Member

    zooba commented Sep 7, 2015

    Good enough for me.

    @larryhastings
    Copy link
    Contributor

    Backout pull request merged, please forward-merge, thanks!

    @larryhastings
    Copy link
    Contributor

    (I meant, just normal pull request. I did your two pull requests right in a row and got my wires crossed.)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 7, 2015

    New changeset c31dad22c80d by Steve Dower in branch '3.5':
    Issue bpo-24917: time_strftime() buffer over-read.
    https://hg.python.org/cpython/rev/c31dad22c80d

    New changeset f185917498ca by Steve Dower in branch '3.4':
    Issue bpo-24917: time_strftime() buffer over-read.
    https://hg.python.org/cpython/rev/f185917498ca

    @vstinner
    Copy link
    Member

    vstinner commented Sep 8, 2015

    The change c31dad22c80d introduced a regression: issue bpo-25029.

    @vstinner vstinner reopened this Sep 8, 2015
    @vstinner
    Copy link
    Member

    vstinner commented Sep 8, 2015

    Oh, tracemalloc sees the memory peak of 8 GB too:

    $ ./python -X tracemalloc -i -m test -v test_strptime
    ...
    SystemExit: True
    >>> import tracemalloc; tracemalloc.get_traced_memory()[1] / 1024.**2
    8201.658247947693

    Memory peak: 8201.7 MB (8.2 GB!).

    @zooba
    Copy link
    Member

    zooba commented Sep 8, 2015

    I'll fix it on bpo-25029. This thread is already too long for my liking.

    @zooba zooba closed this as completed Sep 8, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    deferred-blocker extension-modules C modules in the Modules dir OS-windows type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants