classification
Title: time_strftime() Buffer Over-read
Type: security Stage: resolved
Components: Extension Modules, Windows Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder: MemoryError in test_strptime
View: 25029
Assigned To: steve.dower Nosy List: BreamoreBoy, JohnLeitch, belopolsky, brycedarling, eryksun, georg.brandl, larry, lemburg, paul.moore, python-dev, steve.dower, tim.golden, vstinner, zach.ware
Priority: deferred blocker Keywords: patch

Created on 2015-08-22 23:39 by JohnLeitch, last changed 2015-09-08 15:15 by steve.dower. This issue is now closed.

Files
File name Uploaded Description Edit
time_strftime_Buffer_Over-read.patch JohnLeitch, 2015-08-22 23:39 patch
time_strftime_Buffer_Over-read.py JohnLeitch, 2015-08-22 23:40 repro
time_strftime_Buffer_Over-read_v2.patch JohnLeitch, 2015-09-05 01:29 patch
time_strftime_Buffer_Over-read_v3.patch JohnLeitch, 2015-09-05 02:16
24917_4.patch steve.dower, 2015-09-06 19:50
Messages (67)
msg248998 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-08-22 23:39
Python 3.5 suffers from a vulnerability caused by the behavior of the time_strftime() function. When called, the function loops over the format string provided, using strchr to search for each instance of '%'. After finding a '%', it continues to search two characters ahead, assuming that each instance is the beginning of a well formed format string token.

However, if a string ends with '%', this logic will result in a call to strchr that reads off the end of the format string buffer:

    /* check that the format string contains only valid directives */
    for(outbuf = strchr(fmt, '%'); <<<< Assuming fmt ends with a '%', this will return a pointer to the end of the string.
        outbuf != NULL;
        outbuf = strchr(outbuf+2, '%')) <<<< Once outbuf is pointing to the end of the string, outbuf+2 skips
    {                                        past the null terimnator, leading to a buffer over-read.
        if (outbuf[1]=='#')
            ++outbuf; /* not documented by python, */
        if ((outbuf[1] == 'y') && buf.tm_year < 0)
        {
            PyErr_SetString(PyExc_ValueError,
                        "format %y requires year >= 1900 on Windows");
            Py_DECREF(format);
            return NULL;
        }
    }

In some applications, it may be possible to exploit this behavior to disclose the contents of adjacent memory. The buffer over-read can be observed by running the following script:

from time import *
strftime("AA%"*0x10000)

Which, depending on the arrangement of memory, may produce an exception such as this:

0:000> g
(20b8.18d4): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=ffffffff ebx=52c1a6a0 ecx=00000000 edx=08ef3000 esi=08ec2fe8 edi=08ec2ff8
eip=52d254f3 esp=004cf9d4 ebp=004cfa58 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206
python35!strchr+0x33:
52d254f3 f30f6f0a        movdqu  xmm1,xmmword ptr [edx] ds:002b:08ef3000=????????????????????????????????
0:000> db edx-0x10
08ef2ff0  41 25 41 41 25 41 41 25-00 d0 d0 d0 d0 d0 d0 d0  A%AA%AA%........
08ef3000  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
08ef3010  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
08ef3020  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
08ef3030  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
08ef3040  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
08ef3050  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
08ef3060  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
0:000> k5
ChildEBP RetAddr  
004cf9d0 52c1a7f6 python35!strchr+0x33 [f:\dd\vctools\crt\vcruntime\src\string\i386\strchr_sse.inc @ 75]
004cfa58 52c832d3 python35!time_strftime+0x156 [c:\build\cpython\modules\timemodule.c @ 615]
004cfa74 52ce442f python35!PyCFunction_Call+0x113 [c:\build\cpython\objects\methodobject.c @ 109]
004cfaa8 52ce18ec python35!call_function+0x2ff [c:\build\cpython\python\ceval.c @ 4651]
004cfb20 52ce339f python35!PyEval_EvalFrameEx+0x232c [c:\build\cpython\python\ceval.c @ 3184]

To fix this issue, it is recommended that time_strftime() be updated to check outputbuf[1] for null in the body of the format string directive validation loop. A proposed patch is attached.

Credit: John Leitch (johnleitch@outlook.com), Bryce Darling (darlingbryce@gmail.com)
msg249800 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-04 18:15
Is there a CERT report associated with this vulnerability?
msg249801 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-09-04 18:18
Currently, no. Would you like us to report this and future vulnerabilities to CERT?
msg249805 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-04 18:32
This is happening in windows-only code, so not being a windows user I cannot reproduce it.  It does not look like something new.  This code was last touched in #10653.

Can someone confirm that only 3.5 is affected?
msg249806 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-04 18:34
> Would you like us to report this and future vulnerabilities to CERT?

If it affects released versions, I think this is the right thing to do.  Note that I don't know what PSF policy is on this is or whether they even have one.
msg249857 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-09-04 23:08
I have tried the reproducer on Windows 10 with 2.6, 2.7, 3.3, 3.4, 3.5 and 3.6.  In every case I got this.

C:\py -3.4 -c"from time import *;strftime('AA%'*0x10000)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: Invalid format string
msg249860 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-04 23:12
> In every case I got [ValueError]

You shouldn't have.  I am no Windows expert, but I suspect something is wrong with your use of the command line.  Please try it at the Python prompt or put the code in a script.
msg249861 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-09-04 23:23
> I have tried the reproducer on Windows 10 with 2.6, 2.7, 3.3, 3.4, 3.5 and 3.6.  In every case I got this.

What you are observing is due to the arrangement and contents of process memory. With a simple repro (such as the one provided), there's a good chance the null terminator of the format string will be followed by more null bytes, and thus the code will appear to work as intended. In more complex scripts where memory is ultimately reused, it's more likely that the null terminator will be followed by garbage, non-null bytes.

To make the issue reproduce more reliably, use GFlags to enable heap tail checking, heap free checking, and page heap. 

https://msdn.microsoft.com/en-us/library/windows/hardware/ff549557(v=vs.85).aspx

Then, when you repro the issue, you'll see the crash because the uninitialized memory will contain the fill pattern 0xd0 rather than 0x00, like this:

0:000> db edx-0x10
08ef2ff0  41 25 41 41 25 41 41 25-00 d0 d0 d0 d0 d0 d0 d0  A%AA%AA%........
08ef3000  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
08ef3010  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
08ef3020  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
08ef3030  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
08ef3040  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
08ef3050  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????
08ef3060  ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ??  ????????????????

To be clear, heap verification is not a requirement--the bug can indeed be reproduced without it. However, it will make life easier by introducing more determinism.
msg249862 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-09-04 23:26
Python prompt or script makes no difference.

C:\Users\Mark\Desktop>py -2.6 time_strftime_Buffer_Over-read.py
Traceback (most recent call last):
  File "time_strftime_Buffer_Over-read.py", line 2, in <module>
    strftime("AA%"*0x10000)
ValueError: Invalid format string

Python 3.5.0rc2 (v3.5.0rc2:cc15d736d860, Aug 25 2015, 05:08:37) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from time import *
>>> strftime("AA%"*0x10000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid format string
msg249863 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-04 23:27
Sorry for a bad guess.  John's advise makes much more sense than mine.
msg249864 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2015-09-04 23:34
Okay, so presumably this applies to all supported versions and not just 3.5?  Can you also remove the reproducer and supply specific instructions on how to build the code so that the problem can be reproduced.
msg249867 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-09-04 23:47
It very well may apply to versions apart from 3.5. Our test environment is quite complex and unfriendly to working with multiple versions of Python. Plus, we're strapped for time, so we tend to file under the version we're currently targeting and defer to those better equipped and more familiar with the codebase.

The repro file is fine and there's no need to rebuild anything; the heap verification I'm suggesting is applied to binaries, not source. You can find information here:

https://msdn.microsoft.com/en-us/library/windows/hardware/ff549557(v=vs.85).aspx

Install the debugging tools, run GFlags, select the python.exe/pythonw.exe image file, and apply the settings I mentioned in my previous post.
msg249868 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-04 23:58
John,

There is no doubt that this is a bona fide bug.  I was just hoping that someone with a Windows development machine would help figuring out the affected versions.

From the hg history, it looks like the faulty code is present in all versions starting from 3.2:  

https://hg.python.org/cpython/annotate/977c5753ca32/Modules/timemodule.c#l515

I will set the issue versions accordingly.
msg249870 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-05 00:03
I will keep the "security" classification, but will not increase the priority.  Accepting format strings from untrusted sources is a vulnerability in and by itself and in most cases those strings are literals.
msg249872 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-05 00:10
FWIW, the patch looks good to me, but it needs to be reviewed by a Windows developer.
msg249874 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-09-05 00:41
When I get a bit of slackspace (probably tomorrow afternoon/evening) I can test on the spectrum of versions to confirm the issue is in >= 3.2. I'll also look into improving our automation so all future reports can have the appropriate versions flagged.

Regarding untrusted format strings, I believe you are mistaken. In native applications, untrusted format strings are problematic because an attacker can use injected tokens to read/write arbitrary memory, which can be leveraged to attain code execution.

However, in the context of Python, a format string with too many tokens should be handled safely, resulting in a Python exception rather than exploitable memory corruption. This is the behavior observed in format string handling throughout Python (and indeed most managed/scripting languages). Yes, in most Python programs format strings will be constants, and using dynamically constructed format strings may be considered a bad practice. But, should a developer choose to pass a dynamically constructed string (for example, functionality that allows untrusted users to specify custom time formatting), it's not unreasonable for them to expect memory safety to be maintained.

Of course, if there's a risk I'm overlooking I'd like to better understand it, and the relevant Python documentation should be updated.
msg249876 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-05 00:57
As far as I know, the practical consequence of "security" classification for the issue is how many affected older versions will be patched.  I am keeping that and the 3.2 - 3.5 versions range.

The priority may affect whether this will make it into 3.5.0.  I'll defer this decision to the release managers.

If you want an authoritative answer on the impact of this issue - report it to CERT or FIRST and get them compute the CVSS score.
msg249877 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-05 01:06
Yes, I'd like this fix in 3.5.0.

One bit of feedback on the patch: outbuf is a char * (or wchar_t *), therefore outbuf[1] is a char (or wchar_t).  You shouldn't compare it to NULL.  I'm not sure we still support any compilers that define NULL as (void *)0, but we might.  Anyway, please instead compare to '\0'.
msg249878 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-05 01:12
> if there's a risk I'm overlooking I'd like to better understand it,
> and the relevant Python documentation should be updated.

I don't think there is any special risk that you are overlooking other than a documented fact that Python's strftime is a thin layer on top of system strftime and these are notoriously buggy on many systems.

A python application that accepts custom formats from users should limit those formats to a set that is known to work on the targeted platforms.  Relying on strftime to properly return an error code and not do anything nasty is probably not a good idea.

This said, I express no opinion on the severity of this bug.
msg249881 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-09-05 01:29
Attached is a revised patch.
msg249883 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-05 01:49
Hmm, on Mac OSX "%" and "A%" are valid format strings:

>>> time.strftime("%")
'%'
>>> time.strftime("A%")
'A%'

Mark's experiments show that on Windows they are not. What about the other platforms affected by the patch?  I am concerned about the bottom part of the patch.

A nitpick: an existing error message is "Invalid format string".  I would either reuse it or make it "Incomplete format string".
msg249884 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-09-05 02:16
I plucked the error message from the % operator:

>>> '%' % 'foo'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: incomplete format
>>> '%z' % 'foo'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unsupported format character 'z' (0x7a) at index 1

That said, it should be more consistent. I went with the latter suggestion since it's more informative. An updated patch is attached.

As for platform compatibility, I'd certainly feel better if someone could test the Sun/AIX changes. Unfortunately, I'm unable to do so. It's interesting that OSX accepts "%". As per the spec, "%%" is a "%" literal, and the behavior of invalid tokens such as "%" is undefined.
msg249899 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-09-05 09:55
On 05.09.2015 03:49, Alexander Belopolsky wrote:
> 
> Alexander Belopolsky added the comment:
> 
> Hmm, on Mac OSX "%" and "A%" are valid format strings:
> 
>>>> time.strftime("%")
> '%'
>>>> time.strftime("A%")
> 'A%'
> 
> Mark's experiments show that on Windows they are not. What about the other platforms affected by the patch?  I am concerned about the bottom part of the patch.

Trailing '%' are valid on Linux, just as using unsupported format
codes (those as passed through as is).

On Windows, the C lib strftime() segfaults with a trailing '%', just
as it does with unsupported format codes.

I have this code in mxDateTime to check for this:

static
int _mxDateTime_CheckWindowsStrftime(char *fmt,
				     struct tm *tm)
{
    char *p;

    /* Range checks */
    Py_Assert(tm->tm_sec < 60,
	      PyExc_ValueError,
	      ".strftime() cannot format leap seconds on Windows");
    Py_Assert(tm->tm_mday <= 31,
	      PyExc_ValueError,
	      ".strftime() cannot format days > 31 on Windows");

    /* Scan format string for invalid codes */
    for (p = fmt; *p != '\0'; p++) {
	register char code;
	if (*p != '%')
	    continue;
	code = *++p;
	/* Check for supported format codes; see
	   https://msdn.microsoft.com/en-us/library/fe06s4ak.aspx */
	switch (code) {
	case 'a':
	case 'A':
	case 'b':
	case 'B':
	case 'c':
	case 'd':
	case 'H':
	case 'I':
	case 'i':
	case 'm':
	case 'M':
	case 'p':
	case 'S':
	case 'U':
	case 'w':
	case 'W':
	case 'x':
	case 'X':
	case 'y':
	case 'Y':
	case 'z':
	case 'Z':
	case '%':
	    continue;
	case '\0':
	    Py_Error(PyExc_ValueError,
		     "format string may not end with a '%' character "
		     "on Windows");
	default:
	    Py_ErrorWithArg(PyExc_ValueError,
			    "format code '%c' not supported on Windows",
			    code);
	}
    }
    return 0;

  onError:
    return -1;
}
msg249904 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-09-05 11:41
> On Windows, the C lib strftime() segfaults with a trailing '%', 
> just as it does with unsupported format codes.

No, it doesn't segfault; it invokes the invalid parameter handler (IPH). This could always be configured for the whole process, but with VC 14 it can also be configured per thread. I think it was Steve Dower who updated time_strftime to take advantage of this feature in 3.5, using the new macros _Py_BEGIN_SUPPRESS_IPH and _Py_END_SUPPRESS_IPH. This allowed removing the following check that used to be in the code prior to 3.5:

        if (outbuf[1]=='\0' ||
            !strchr("aAbBcdHIjmMpSUwWxXyYzZ%", outbuf[1]))
        {
            PyErr_SetString(PyExc_ValueError, "Invalid format string");
            Py_DECREF(format);
            return NULL;
        }

With this change the time module no longer has to make assumptions about what the CRT considers to be a valid format string in order to avoid invoking the IPH. However, this also accidentally removed checking whether outbuf[1]=='\0'. Now, instead of calling the default IPH that terminates the process, Python's _silent_invalid_parameter_handler gets called, which of course does nothing by design:

    >>> import time
    >>> time.strftime('A%')
    Breakpoint 0 hit
    python35!_silent_invalid_parameter_handler:
    00000000`5eadae50 c20000          ret     0
    0:000> k8
    Child-SP          RetAddr           Call Site
    00000000`002af018 000007fe`e9d8d2ab python35!_silent_invalid_parameter_handler
    00000000`002af020 000007fe`e9d8d2c9 ucrtbase!invalid_parameter+0x103
    00000000`002af060 000007fe`e9dafedc ucrtbase!invalid_parameter_noinfo+0x9
    00000000`002af0a0 000007fe`e9dac359 ucrtbase!Wcsftime_l+0x168
    00000000`002af140 000007fe`e9dac3f5 ucrtbase!Strftime_l+0x155
    00000000`002af1d0 00000000`5e9fc785 ucrtbase!strftime+0x15
    00000000`002af210 00000000`5ea7d5c2 python35!time_strftime+0x1f5
    00000000`002af2b0 00000000`5eaf8fd0 python35!PyCFunction_Call+0x122
msg249905 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-05 11:51
Rather than debating about how various platforms handle malformed format strings for strftime(), and whether or not they crash, we should simply prevent the native strftime() function from seeing them in the first place.

I'd like the "v3" patch in 3.5.0rc3.  Can someone create a pull request?
msg249908 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-09-05 12:27
> Rather than debating about how various platforms handle malformed 
> format strings for strftime(), and whether or not they crash, we 
> should simply prevent the native strftime() function from seeing 
> them in the first place.

Of course the check needs to be restored. I wasn't suggesting to go back to the default IPH to have it terminate the process. That won't help, anyway. The potential over-read is in the for loop, before calling strftime. I just thought the context for the accidental removal was relevant. Care needs to be taken when removing checks that were put in place to avoid the default IPH, because those checks may serve a dual purpose in the surrounding code.

The patch also fixes the AIX/Sun code, which needs to be applied to 3.4 as well. I don't think this issue applies to 3.2 and 3.3.
msg249919 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-05 19:07
I'll do the PR.
msg249920 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-05 19:18
> Rather than debating about how various platforms handle malformed 
> format strings for strftime(), and whether or not they crash, we 
> should simply prevent the native strftime() function from seeing 
> them in the first place.

The crash is nothing to do with the native strftime() function - Python was crashing because it *tried* to prevent malformed strings and we (I) got it wrong. If we were simply passing the string through unchecked this would not have been an issue (or it would have been an issue against the CRT).

PR at https://bitbucket.org/larry/cpython350/pull-requests/18/issue-24917-time_strftime-buffer-over-read/diff
msg249950 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-06 00:13
Pull request accepted.  Please forward-merge.  Thanks!
msg249956 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-06 04:02
New changeset 09b62202d9b7 by Steve Dower in branch '3.5':
Issue #24917: time_strftime() Buffer Over-read. Patch by John Leitch.
https://hg.python.org/cpython/rev/09b62202d9b7

New changeset 8b81b7ad2d0a by Steve Dower in branch '3.5':
Issue #24917: Moves NEWS entry under Library.
https://hg.python.org/cpython/rev/8b81b7ad2d0a

New changeset a29b49d57769 by Steve Dower in branch '3.4':
Issue #24917: time_strftime() Buffer Over-read. Patch by John Leitch.
https://hg.python.org/cpython/rev/a29b49d57769

New changeset 6fc744ac3953 by Steve Dower in branch 'default':
Issue #24917: time_strftime() Buffer Over-read. Patch by John Leitch.
https://hg.python.org/cpython/rev/6fc744ac3953
msg249958 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-06 04:08
I applied the AIX fix to 3.4 (introduced in #19634). Python 3.2 and 3.3 aren't affected.
msg249960 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-06 04:49
The tests from this patch fail on Linux.

-----
First: There is no trailing % test on Linux, and glibc's strftime() happily ignores a trailing %, so no ValueError is raised.

Python should do either one or the other of the following:

1) Python should enforce no trailing % in the strftime format string,
   or

2) the test suite shouldn't assume that a trailing % in the strftime
   value string raises a ValueError.

I can live with either of these, not sure what the right decision is.

-----
Second: The test from the patch assumes that strftime('%#') will raise a ValueError.  Again, strftime in Linux glibc happily accepts "%#" as a format string and thus no ValueError is raised.

Python is agnostic about native format units in the strftime() format string.  Therefore I strongly assert that Python must not assume that "%#" is an illegal format string.  Therefore the tests must not assume that "%#" raises ValueError.

Given that the code used to crash, I do want the code path exercised in the test suite.  So I propose that the test attempt time.strftime('%#') and accept either success or ValueError.


-----

Given that I've accepted this patch into 3.5.0, and it's now blocking my release, it is implicitly a "release blocker".  I need to resolve this tonight before I can tag 3.5.0rc3.  I'm going to dinner, and maybe we can have a quick discussion and come to a decision in the next hour or two.


p.s. The checkin also flunked PEP 7.  *sigh*
msg249961 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-09-06 05:01
Is there a way to see what style guidelines have been violated? The only thing I can think of is the curly braces in the Windows check, but I was following the conventions of the surrounding code.
msg249962 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-06 05:10
The starting curly brace goes on the same line as the statement starting the block.  Keywords followed by a left parenthesis get a space between the keyword and the parenthesis.  It's a small matter, I'm really much more interested in reconciling the behavior of strftime and its tests.
msg249963 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-06 05:24
Sorry, maybe you inherited those violations.  I was in a hurry and not in a charitable frame of mind.
msg249964 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-06 05:26
I'll fix the PEP 7 stuff (mostly inherited).

MSVC can also handle '%#' and '%' as format strings (producing '' in both cases). If that matches libc behaviour, I see no reason to raise a ValueError here apart from consistency with previous Python releases.

If consistency with system C libraries is preferred, then I'll change the condition to simply break out of the loop and make the test call the function expecting success. I assume there's an AIX buildbot we can watch to see the impact on that part?
msg249965 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-09-06 05:30
Yikes--your comment prompted me to look at the check-in, and it seems my patch wasn't properly applied. The curly braces got tweaked, which is minor as you stated, but more importantly the AIX code should not decref format. That could introduce problems bigger than what this patch was attempting to fix.

And, not to dwell, but where do you see a keyword immediately followed by a left parens? I want to make sure everything is properly polished in the future, and the only thing I see is the untouched "for".

Regarding your initial concerns:
1) I think we should enforce no trailing % so as to not pass format strings that may cause undefined behavior.

2) How about expecting ValueError on Windows/AIX, and pass on all other platforms?
msg249966 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-06 05:34
Whoops, must have done a bad copy-paste to get that DECREF in there (I couldn't apply the patch directly because it didn't come from an HG repo, so I had to do it by hand).

MSVC seems somewhat inconsistent about its response:
>>> strftime('aaa%')
''
>>> strftime('aaaa%')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Invalid format string

How closely do we want to simply expose the underlying C library's behaviour?
msg249967 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-09-06 05:49
If it's so wildly inconsistent, it's my opinion that Python should perform its own validation to achieve better cross-platform support. The alternative is playing a never ending game of whack-a-mole, or just accepting that format strings may cause exceptions in some platforms but not others.
msg249969 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-06 06:04
Having now read over this whole issue, I don't actually see where the security vulnerability is (on Windows at least).

This is a user-mode read, so it can only access memory in the same process, and it doesn't display it anywhere. The worst that can happen is that it hits an unreadable page and crashes, which falls under "undefined behaviour due to invalid input".

I think we should just revert the patch completely.
msg249970 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-06 06:05
I have convinced myself that disallowing trailing % marks on all platforms is the right thing to do.  Whether or not the underlying platform tolerates it, it is never *valid* input to strftime.

I have a patch incubating at home.  I'll put it up for review when I get back, maybe an hour.
msg249971 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-06 06:08
I'll get my commits out of the way to make the buildbots green again.
msg249972 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-06 06:10
Oh, maybe it was all like that before.  Sorry, I was in a hurry and not in a charitable frame of mind.
msg249973 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-06 06:14
New changeset 73227e857d6b by Steve Dower in branch '3.5':
Issue #24917: Backed out changeset 09b62202d9b7
https://hg.python.org/cpython/rev/73227e857d6b

New changeset 743924064dc8 by Steve Dower in branch 'default':
Issue #24917: Backed out changeset 09b62202d9b7
https://hg.python.org/cpython/rev/743924064dc8
msg249974 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-06 06:18
Yeah, I came in late and in too much of a rush too :(

Backed out the broken changes, and now I'm off to bed. Not convinced there's anything that needs fixing here, and IIRC we argued through the different platform behaviours on the issue when I removed the null check that was preventing the overread.

If we do anything now, it should just be `if (outbuf[1] == '\0') break` to avoid the overread. strftime() is probably too far gone to make it consistent across platforms now.
msg249975 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-09-06 06:34
Yes, this is a user-mode read, but I disagree with the assertion that it's not possible to use this to disclose memory. While it isn't as critical as something that outright dumps memory, there is logic that throws exceptions based on values encountered while reading outside the bounds of the buffer. This could be used as a channel to infer what is or isn't in adjacent memory. That it's user-mode doesn't matter--if an application exposes the format string as attack surface, suddenly process memory can be probed. So, it's not heartbleed, but it does have security implications. If you'd like, I can take a shot at building a PoC.

Further, it's best to err on the side of caution with bugs like these; just because it doesn't seem like major issue now doesn't mean someone won't come along in the future and prove otherwise.
msg249976 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-06 06:54
I'm going to back this out of 3.5.0rc3.  I'm still willing to discuss accepting it into 3.5.0 final in some form, but for now I don't want to hold up the release.

Steve: it should never be possible to crash the Python interpreter with well-formed Python code.  Whether or not it's possible to exploit it, I do think this crash is something the interpreter should prevent.
msg249977 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-09-06 06:57
> MSVC seems somewhat inconsistent about its response:
> >>> strftime('aaa%')
> ''

That's not due to MSVC. It's setting errno to EINVAL. The problem is that time_strftime is testing (buflen > 0 || i >= 256 * fmtlen). The initial value of the outbuf size i is 1024, so when (fmtlen <= 4), the value of (256 * fmtlen) is less than or equal to i, and it assumes "the format yields an empty result" without considering the value of errno. So instead of raising an exception for EINVAL, it calls PyUnicode_DecodeLocaleAndSize to return an empty string:

    >>> strftime('aaa%')
    Breakpoint 1 hit
    ucrtbase!strftime:
    000007fe`e2bac3e0 4883ec38        sub     rsp,38h
    0:000> gu
    python35!time_strftime+0x1f5:
    00000000`5de9c785 488bcb          mov     rcx,rbx
    0:000> be 0; g
    Breakpoint 0 hit
    ucrtbase!_errno:
    000007fe`e2b341b0 48895c2408      mov     qword ptr [rsp+8],rbx ss:00000000`0028f320=ffffffffffffffff

errno is 22:

    0:000> pt; dd @rax l1
    00000000`002ddb50  00000016
    0:000> bd 0; g
    Breakpoint 2 hit
    python35!PyUnicode_DecodeLocaleAndSize:
    00000000`5df55070 4053            push    rbx
    0:000> k 2
    Child-SP          RetAddr           Call Site
    00000000`0028f318 00000000`5de9c81a python35!PyUnicode_DecodeLocaleAndSize
    00000000`0028f320 00000000`5df1d5c2 python35!time_strftime+0x28a
    0:000> g
    ''
msg249994 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-06 13:54
Larry: agreed, no crashing. We'll get the trivial fix in that doesn't raise any error, and should probably stop suppressing the exception in the situation eryksun describes. That will correctly expose the platform's strftime semantics.

Unless you want to write more thorough validation of incoming strings, I'll work something up later today or tomorrow.
msg250014 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-06 19:50
New patch attached that just breaks out of the scanning loop and lets the system CRT handle invalid format strings. Fixes the condition that was suppressing some errors on Windows.

Also fixed the PEP 7 issues around the changed code (I believe). No new test because the crash cannot be reliably reproduced and the expected results vary dramatically between platforms.

The two if/break lines in the AIX branch should also be applied to 3.4 to prevent crashes. The rest of the fix (including the PEP 7 fixes) should not.
msg250020 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-09-06 20:59
First, let me begin by saying I believe this patch will fix the buffer over-read, which is a good step forward.

However, after giving the matter more thought, and at the risk of wearing out my welcome, I am of the belief that relying on the CRT to handle malformed format strings is the wrong approach. As per the C spec, strftime's behavior when handling invalid format strings is undefined:

"If a conversion specifier is not one of the above, the behavior is undefined"

Quite often, "undefined" translates to "exploitable". And at the very least, by not performing thorough enough validation, Python is misusing strftime(), which may lead to crashes or undermine memory safety. Of course, this is all speculation--I haven't the time or resource to learn other platforms to see what's possible. But, even if I could, the task would be Sisyphean because there's simply no way to know what the future holds when dealing with implementation that could change at any point.

I realize we must be pragmatic with matters such as this, and a dramatic change could be breaking for some Python apps. Even so, I feel it's worth vocalizing these concerns. As a principal, I think that "safe", well-formed Python should never be able to perform operations that lead to undefined behavior in the underlying runtime.

Alright, rant done. If at any point in time locking down Python's strftime with more aggressive validation is considered viable, I am more than willing to take a shot at submitting a patch.
msg250022 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2015-09-06 21:19
Historically, time and os modules have been considered low level, thin wrappers over system libc functions.  Users of these modules are the proverbial "consenting adults" who should understand their power and the associated risks.

The place to provide a better behaved strftime function is the datetime module, but I would leave time.stftime pretty much the way it is and would only consider making *fewer* checks on Windows if _Py_BEGIN_SUPPRESS_IPH / _Py_END_SUPPRESS_IPH macros can indeed help to avoid crashes on illegal format strings.
msg250023 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-06 21:55
FWIW, those macros prevent a certain class of "undefined behavior," which in this case is to terminate the process rather than return an error code. They don't do anything to prevent crashes due to exploitation or malicious code.

The place for more robust formatting is datetime.__format__(), which from a quick test seems to be more strict about what is in the format string.
msg250024 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-06 22:04
Having slept on it, I agree with Steve.  We should make the minimal change necessary in order to not crash.

However, it still needs a regression test.  The test can use JohnLeitch's proposed test as a good starting point, but it must accept either success or ValueError as passing the test.

I still think there's merit in detecting a trailing unquoted % at the end of the format string, but that would only be appropriate for 3.6 at this point.
msg250030 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-09-06 22:45
With MSVC, if errno is cleared before calling strftime, then when buflen == 0 you know whether it's an invalid format string (EINVAL) or maxsize is too small (ERANGE). There's no need to guess. Oddly, only EINVAL is documented, even though setting ERANGE has been in the implementation for years. 

VC 10 (strftime.c):

            /* error - return an empty string */
            *(strstart)='\0';

            /* now return our error/insufficient buffer indication */
            if ( !failed && left <= 0 )
            {
                /* do not report this as an error to allow the caller to resize */
                errno=ERANGE;
            }
            else
            {
                _VALIDATE_RETURN( FALSE, EINVAL, 0);
            }

VC 14 (time/wcsftime.cpp):

        // Error:  return an empty string:
        *string = L'\0';

        // Now return our error/insufficient buffer indication:
        if (!failed && remaining <= 0)
        {
            // Do not report this as an error to allow the caller to resize:
            errno = ERANGE;
        }
        else
        {
            _VALIDATE_RETURN(false, EINVAL, 0);
        }
msg250031 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-06 23:14
How's this for the test (I added an explanatory comment, since it may look like it isn't testing anything to someone who isn't familiar with the issue):

    def test_strftime_format_check(self):
        # Test that strftime does not crash on invalid format strings
        # that may trigger a buffer overread. When not triggered,
        # strftime may succeed or raise ValueError depending on
        # the platform.
        for x in [ '', 'A', '%A', '%AA' ]:
            for y in range(0x0, 0x10):
                for z in [ '%', 'A%', 'AA%', '%A%', 'A%A%', '%#' ]:
                    try:
                        time.strftime(x * y + z)
                    except ValueError:
                        pass
msg250035 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-07 02:24
New PR: https://bitbucket.org/larry/cpython350/pull-requests/19/issue-24917-time_strftime-buffer-over-read
msg250039 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-07 02:55
Steve, did you confirm that the test triggers the array bounds bug when the patch *isn't* applied?  I want to confirm both that a) the test exercises the bug, and b) the fix fixes the bug.  I assume you ran the test suite with the patch applied, so that covers b).
msg250040 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-07 03:06
I wasn't able to repro the crash at all, even with the debugging flags that make this sort of issue more prominent. It relies on a very precise layout of multiple objects in memory, or possibly a specific sequence of allocations/deallocations, as well as a format string ending in an unescaped '%' or (on Windows) '%#'.

It's still obviously an issue though - we should have the check for '\0' there by any reasonably analysis of the code, or else should not be adding 2 to the pointer to start the next step of the search.
msg250041 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-07 03:26
Okay, I think *I* reproduced it.

1) I pulled your cpython350 fork down locally.

2) I updated to your checkin that fixed the bug. (c31dad22c80d)

3) I reverted the change to Modules/timemodule.c to put the bug back:
   % hg cat -r 97393 Modules/timemodule.c > Modules/timemodule.c

4) I changed line 611 (or so) from "#if defined(MS_WINDOWS) && !defined(HAVE_WCSFTIME)" to "#if 1" so I'd get the code that had the bug.

5) I ran "./configure --with-valgrind && make" to make it.

6) I ran "valgrind ./python -m test test_time" and ***Valgrind complained about an array overrun***.

7) I restored the bugfix to Modules/timemodule.c, then reinstated the change from 4) above.

8) I ran make and valgrind again and didn't get the complaint about the array overrun.

For grins I also tried enabling the other stanza of code that has the bug (the AIX / sun / have_wcsftime) and observed the same thing.

Is that convincing?
msg250042 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-07 03:36
Good enough for me.
msg250062 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-07 05:12
Backout pull request merged, please forward-merge, thanks!
msg250064 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2015-09-07 05:13
(I meant, just normal pull request.  I did your two pull requests right in a row and got my wires crossed.)
msg250071 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-09-07 05:38
New changeset c31dad22c80d by Steve Dower in branch '3.5':
Issue #24917: time_strftime() buffer over-read.
https://hg.python.org/cpython/rev/c31dad22c80d

New changeset f185917498ca by Steve Dower in branch '3.4':
Issue #24917: time_strftime() buffer over-read.
https://hg.python.org/cpython/rev/f185917498ca
msg250202 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-08 13:49
The change c31dad22c80d introduced a regression: issue #25029.
msg250203 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-09-08 13:52
Oh, tracemalloc sees the memory peak of 8 GB too:

$ ./python -X tracemalloc -i -m test -v test_strptime
...
SystemExit: True
>>> import tracemalloc; tracemalloc.get_traced_memory()[1] / 1024.**2
8201.658247947693

Memory peak:  8201.7 MB (8.2 GB!).
msg250213 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2015-09-08 15:15
I'll fix it on #25029. This thread is already too long for my liking.
History
Date User Action Args
2015-09-08 15:15:28steve.dowersetstatus: open -> closed
superseder: MemoryError in test_strptime
resolution: fixed
messages: + msg250213
2015-09-08 13:52:21vstinnersetmessages: + msg250203
2015-09-08 13:49:20vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg250202
2015-09-07 05:38:04python-devsetmessages: + msg250071
2015-09-07 05:34:07larrysetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2015-09-07 05:13:10larrysetmessages: + msg250064
2015-09-07 05:12:00larrysetmessages: + msg250062
2015-09-07 03:36:33steve.dowersetmessages: + msg250042
2015-09-07 03:26:12larrysetmessages: + msg250041
2015-09-07 03:06:03steve.dowersetmessages: + msg250040
2015-09-07 02:55:38larrysetmessages: + msg250039
2015-09-07 02:24:33steve.dowersetmessages: + msg250035
2015-09-06 23:14:22steve.dowersetmessages: + msg250031
2015-09-06 22:45:38eryksunsetmessages: + msg250030
2015-09-06 22:04:04larrysetmessages: + msg250024
2015-09-06 21:55:24steve.dowersetmessages: + msg250023
2015-09-06 21:19:11belopolskysetmessages: + msg250022
2015-09-06 20:59:26JohnLeitchsetmessages: + msg250020
2015-09-06 19:50:41steve.dowersetfiles: + 24917_4.patch

messages: + msg250014
versions: - Python 3.2, Python 3.3
2015-09-06 13:54:41steve.dowersetmessages: + msg249994
2015-09-06 07:52:32larrysetpriority: release blocker -> deferred blocker
2015-09-06 06:57:19eryksunsetmessages: + msg249977
2015-09-06 06:54:28larrysetmessages: + msg249976
2015-09-06 06:34:54JohnLeitchsetmessages: + msg249975
2015-09-06 06:18:38steve.dowersetmessages: + msg249974
2015-09-06 06:14:45python-devsetmessages: + msg249973
2015-09-06 06:10:27larrysetmessages: + msg249972
2015-09-06 06:08:36steve.dowersetmessages: + msg249971
2015-09-06 06:05:40larrysetmessages: + msg249970
2015-09-06 06:04:11steve.dowersetmessages: + msg249969
2015-09-06 05:49:35JohnLeitchsetmessages: + msg249967
2015-09-06 05:34:36steve.dowersetmessages: + msg249966
2015-09-06 05:30:57JohnLeitchsetmessages: + msg249965
2015-09-06 05:26:00steve.dowersetstatus: closed -> open
priority: normal -> release blocker

nosy: + georg.brandl
messages: + msg249964

resolution: fixed -> (no value)
2015-09-06 05:24:48larrysetmessages: + msg249963
2015-09-06 05:10:29larrysetmessages: + msg249962
2015-09-06 05:01:34JohnLeitchsetmessages: + msg249961
2015-09-06 04:49:19larrysetmessages: + msg249960
2015-09-06 04:08:38steve.dowersetstatus: open -> closed
messages: + msg249958

assignee: steve.dower
resolution: fixed
stage: patch review -> commit review
2015-09-06 04:02:09python-devsetnosy: + python-dev
messages: + msg249956
2015-09-06 00:13:43larrysetmessages: + msg249950
2015-09-05 19:18:07steve.dowersetmessages: + msg249920
2015-09-05 19:07:43steve.dowersetmessages: + msg249919
2015-09-05 12:27:49eryksunsetmessages: + msg249908
2015-09-05 11:51:21larrysetmessages: + msg249905
2015-09-05 11:41:17eryksunsetnosy: + eryksun
messages: + msg249904
2015-09-05 09:55:32lemburgsetmessages: + msg249899
2015-09-05 02:16:17JohnLeitchsetfiles: + time_strftime_Buffer_Over-read_v3.patch

messages: + msg249884
2015-09-05 01:49:29belopolskysetmessages: + msg249883
2015-09-05 01:29:11JohnLeitchsetfiles: + time_strftime_Buffer_Over-read_v2.patch

messages: + msg249881
2015-09-05 01:12:56belopolskysetmessages: + msg249878
2015-09-05 01:06:03larrysetmessages: + msg249877
2015-09-05 00:57:47belopolskysetnosy: + larry
messages: + msg249876
2015-09-05 00:41:34JohnLeitchsetmessages: + msg249874
2015-09-05 00:10:11belopolskysetmessages: + msg249872
components: + Extension Modules
2015-09-05 00:03:56belopolskysetmessages: + msg249870
2015-09-04 23:58:16belopolskysetmessages: + msg249868
versions: + Python 3.2, Python 3.3, Python 3.4
2015-09-04 23:47:22JohnLeitchsetmessages: + msg249867
2015-09-04 23:34:11BreamoreBoysetnosy: + steve.dower, paul.moore, tim.golden, zach.ware
messages: + msg249864
components: + Windows
2015-09-04 23:27:19belopolskysetmessages: + msg249863
2015-09-04 23:26:20BreamoreBoysetmessages: + msg249862
2015-09-04 23:23:29JohnLeitchsetmessages: + msg249861
2015-09-04 23:12:49belopolskysetmessages: + msg249860
2015-09-04 23:08:04BreamoreBoysetnosy: + BreamoreBoy
messages: + msg249857
2015-09-04 18:34:06belopolskysetmessages: + msg249806
2015-09-04 18:32:02belopolskysetnosy: + vstinner
messages: + msg249805
2015-09-04 18:18:29JohnLeitchsetmessages: + msg249801
2015-09-04 18:18:13belopolskysetstage: patch review
2015-09-04 18:15:44belopolskysetmessages: + msg249800
2015-09-04 17:43:26JohnLeitchsetnosy: + lemburg, belopolsky
2015-08-23 22:52:36brycedarlingsetnosy: + brycedarling
2015-08-22 23:40:44JohnLeitchsetfiles: + time_strftime_Buffer_Over-read.py
2015-08-22 23:39:19JohnLeitchcreate