classification
Title: Use Py_CLEAR where necessary to avoid crashes
Type: behavior Stage: patch review
Components: Interpreter Core Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, arigo, ddorfman, jcea, loewis, mwh, rhettinger
Priority: normal Keywords: patch

Created on 2004-09-01 05:35 by ddorfman, last changed 2012-11-10 01:50 by jcea. This issue is now closed.

Files
File name Uploaded Description Edit
useclear.diff ddorfman, 2004-09-01 05:35 Mostly mechanical transformation to use Py_CLEAR
clearcand.py ddorfman, 2004-09-01 05:36 Utility to find broken cases
Messages (11)
msg46820 - (view) Author: Dima Dorfman (ddorfman) Date: 2004-09-01 05:35
The Py_CLEAR macro was introduced to make it less tempting
to write incorrect code in the form

  Py_DECREF(self->field);
  self->field = NULL;

because that can expose a half-destroyed object if
deallocation can cause self->field to be read. This patch
fixes mistakes like this that still exist in the core.

Only cases that are potentially dangerous are fixed in this
patch. If self->field can only reference a special kind of
[builtin] object, then it's just a style bug because we know
that the builtin object isn't evil. Likewise if the code is
operating on an automatic variable. It might be nice to fix
those style bugs anyway, to discourage the broken idiom.


Just for kicks, here's a way to use this bug in reversed to
crash the interpreter:

  import array, gc, weakref
  a = array.array('c')
  wr = weakref.ref(a, lambda _: gc.get_referents(rev))
  rev = reversed(a)
  del a
  list(rev)

For me, this crashes immediately with a debug build and
after a couple tries otherwise.


Also attached is clearcand.py to help find these cases. It's
not comprehensive, but it's a start. Usage:

  $ find src -name '*.c' | python clearcand.py | fgrep -- '->'


The patch requires SF #1020185 to be applied for genobject.c
to compile without warnings.
msg46821 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-01 07:06
Logged In: YES 
user_id=80475

Accepted and applied.
See:
    Include/object.h 2.129
    Modules/itertoolsmodule.c 1.35
    Objects/enumobject.c 1.18
    Objects/genobject.c 1.4
    Objects/iterobject.c 1.19

Thanks for the patch.
msg46822 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2004-09-01 12:32
Logged In: YES 
user_id=6656

Test cases would be nice.  If it's easy for someone whose thought 
about it, that would be great, if not assign to me and I'll get to it...
msg46823 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-09-08 17:33
Logged In: YES 
user_id=80475

FWIW, rather than add specific testcases (closing the 
barndoor after the horse has escaped), it might be a good 
idea to fixup the OP's search code and add it to the toolbox.

Future errors of this type are better caught through code 
scans than by testing stuff that already works fine.

Of course, if highly motivated, you can do both :-)
msg46824 - (view) Author: Dima Dorfman (ddorfman) Date: 2004-09-10 13:52
Logged In: YES 
user_id=908995

I didn't write specific tests for these since they really
would be testing for one particular line of code. It's not
really hard to write them, though; I think my sample using
reversed can be easily adapted to iterobject and itertools.
I'm not sure whether genobject can be exploited like this,
and I don't think it's worth finding out just to write a
test case.

On the other hand, if you (anyone) think it's worth it, I'll
clean up clearcand.py not to require a pipeline and it can
be added to the toolbox. For that to make sense, though, we
should probably fix the style bugs that it catches so it can
be reasonable to expect no output. Here are the outstanding
matches that dereference something using "->":

src/Modules/_bsddb.c 746 'Py_DECREF(self->myenvobj);\nself->myenvobj = 
NULL;'
src/Modules/_bsddb.c 781 'Py_DECREF(self->myenvobj);\nself->myenvobj = 
NULL;'
src/Modules/_bsddb.c 786 'Py_DECREF(self->associateCallback);
\nself->associateCallback = NULL;'
src/Modules/_bsddb.c 1184 'Py_DECREF(self->associateCallback);
\nself->associateCallback = NULL;'
src/Modules/svmodule.c 281 'Py_DECREF(self->ob_svideo);
\nself->ob_svideo = NULL;'
src/Mac/Modules/carbonevt/_CarbonEvtmodule.c 1060 
'Py_DECREF(_self->ob_callback);\n_self->ob_callback = NULL;'
src/Objects/unicodeobject.c 161 'Py_DECREF(unicode->defenc);
\nunicode->defenc = NULL;'
src/Objects/unicodeobject.c 250 'Py_DECREF(unicode->defenc);
\nunicode->defenc = NULL;'

unicode and sv are safe, and bsddb is on my list to look at.
It shouldn't be harmful to just change all of those to
Py_CLEAR, though. I can cobble up a patch for that, too.

Opinions?
msg46825 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2004-09-10 14:09
Logged In: YES 
user_id=6656

Yeah, I've come to a similar conclusion (finally got round
to thinking about this properly this morning).  I think a
cleaned up script would be a good idea.

Two improvements (which you may well have made already):
Py_XDECREF is vulnerable too (do we want Py_XCLEAR??) and
Py_DECREF(x); x == NULL is ok AFAICT when x is an identifier.
msg46826 - (view) Author: Dima Dorfman (ddorfman) Date: 2004-09-10 14:29
Logged In: YES 
user_id=908995

Okay, I'll post a cleaned up script this weekend. About your
improvements:

1. Py_CLEAR already has the same semantics as a would-be
   Py_XCLEAR--it doesn't do anything if its argument is
   null. Py_XDECREF should definitely be checked for,
   though.

2. Py_DECREF(x) is probably safe; this is what I meant when
   I talked about operating on automatic variables in the
   initial comment. The grep for "->" filters out matches
   that are safe in this manner. It should be noted that
   this isn't necessarily safe, just that it's almost
   certainly so. In particular, it's not safe in this case:

     PyObject *x = ...;
     /* self is a struct { PyObject **something; ... } */
     self->something = &x;
     Py_DECREF(x);
     x = NULL;

   but doing that is just insane.

   The grep would also miss ((*self).something), but I
   didn't expect to see any of those.
msg46827 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-09-27 18:50
Logged In: YES 
user_id=4771

The grep may miss other expressions, like Py_DECREF(items[index]).  There is also the dreadful case of macros: Py_DECREF(x) used in a macro which is called with self->something as x.  In other words I am not sure the script can replace a (really motivated) programmer's extensive review...
msg46828 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-02-15 10:14
What's the status of this? Does anybody want to incorporate the test script?

There are still a few cases left (or readded since the patch was originally made), in Modules (_bsddb, _sqlite, sv, _CarbonEvt), and in unicodeobject.

mwh, if you lost interest, please unassign.
msg114375 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-08-19 16:16
Any interest in this or can it be closed?
msg114412 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2010-08-19 21:41
I think it makes sense to close this; if problems remain they should be reported in more targeted tickets.
History
Date User Action Args
2012-11-10 01:50:57jceasetnosy: + jcea
2010-08-19 21:41:17mwhsetstatus: open -> closed

messages: + msg114412
2010-08-19 16:16:30BreamoreBoysetversions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
nosy: + BreamoreBoy

messages: + msg114375

type: behavior
stage: patch review
2009-02-14 14:43:23ajaksu2setversions: - Python 2.5
2008-01-05 20:06:57christian.heimessetversions: + Python 2.6, Python 2.5
2004-09-01 05:35:35ddorfmancreate