Issue 17206: Py_XDECREF() expands its argument multiple times

When running my refcount static analyzer [1] on a large corpus of real-world C extension code for Python, I ran into the following construct in various places:

    Py_XDECREF(PyObject_CallObject(callable, args));

This is bogus code: Py_XDECREF expands its argument multiple times, so after this goes through the C preprocessor the callable is actually called invoked up to 4 times, leaking refs to 3 of the results - assuming that none of the calls fail, and a non pydebug build of CPython (which would expand the argument more times).

The raw preprocessor output for an optimized Python 2.7 looks like this:
   do { if ((PyObject_CallObject(callable, args)) == ((void *)0)) ; else do { if ( --((PyObject*)(PyObject_CallObject(callable, args)))->ob_refcnt != 0) ; else ( (*(((PyObject*)((PyObject *)(PyObject_CallObject(callable, args))))->ob_type)->tp_dealloc)((PyObject *)((PyObject *)(PyObject_CallObject(callable, args))))); } while (0); } while (0);

Cleaned up (and removing some of the casts for clarity) it looks like this:
  do { 
    if ((PyObject_CallObject(callable, args)) == ((void *)0))
      ; 
    else 
      do {
        if (--(PyObject_CallObject(callable, args)->ob_refcnt) != 0)
          ;
        else
          (*(PyObject_CallObject(callable, args)->ob_type)->tp_dealloc)
            PyObject_CallObject(callable, args);
      } while (0);
  } while (0);
which is clearly not what the extension author was expecting.

Should we:
  (A) update the macro so that it expands its argument only once, or
  (B) warn people not to write code like the above?

Similar considerations apply to the other macros, I guess, but I've seen the above used "in the wild", I haven't yet seen it for the other macros).

Seen in the wild in:
  python-alsa-1.0.25-1.fc17:
      pyalsa/alsamixer.c:179
    00174 | 	for (i = 0; i < count; i++) { 
    00175 | 		t = PyTuple_New(2); 
    00176 | 		if (t) { 
    00177 | 			PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); 
    00178 | 			PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 
    00179>| 			Py_XDECREF(PyObject_CallObject(reg, t)); 
    00180 | 			Py_DECREF(t); 
    00181 | 		} 
    00182 | 	}	 
    00183 |  
    00184 | 	Py_XDECREF(reg); 

      pyalsa/alsahcontrol.c:190
    00185 | 	for (i = 0; i < count; i++) { 
    00186 | 		t = PyTuple_New(2); 
    00187 | 		if (t) { 
    00188 | 			PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); 
    00189 | 			PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 
    00190>| 			Py_XDECREF(PyObject_CallObject(reg, t)); 
    00191 | 			Py_DECREF(t); 
    00192 | 		} 
    00193 | 	}	 
    00194 |  
    00195 | 	Py_XDECREF(reg); 

      pyalsa/alsaseq.c:3277
    03272 |     for (i = 0; i < count; i++) { 
    03273 |         t = PyTuple_New(2); 
    03274 |         if (t) { 
    03275 |             PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); 
    03276 |             PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 
    03277>|             Py_XDECREF(PyObject_CallObject(reg, t)); 
    03278 |             Py_DECREF(t); 
    03279 |         } 
    03280 |     } 
    03281 |  
    03282 |     Py_XDECREF(reg); 

 python-4Suite-XML-1.0.2-14.fc17:
      Ft/Xml/src/domlette/refcounts.c:80
    00075 |     /* refcount = this */ 
    00076 |     expected = 1; 
    00077 |   } 
    00078 |   else { 
    00079 |     sprintf(buf, "Unexpected object type '%.200s'" ,node->ob_type->tp_name); 
    00080>|     Py_XDECREF(PyObject_CallMethod(tester, "error", "s", buf)); 
    00081 |     return 0; 
    00082 |   } 
    00083 |  
    00084 |   sprintf(buf, "%.200s refcounts", node->ob_type->tp_name); 
    00085 |   return do_test(tester, buf, expected, node->ob_refcnt); 

[Note to self: I actually saw this because it uncovered a traceback in cpychecker, which I fixed as:
http://git.fedorahosted.org/cgit/gcc-python-plugin.git/commit/?id=99fa820487e380b74c2eda1d97facdf2ee6d2f3a ]

[1] https://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html

Title:	Py_XDECREF() expands its argument multiple times
Type:	behavior	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.4

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	Arfrever, Mark.Shannon, amaury.forgeotdarc, asvetlov, benjamin.peterson, brett.cannon, dmalcolm, ilblackdragon, jcea, jkloth, larry, lemburg, mark.dickinson, python-dev, vstinner
Priority:	normal	Keywords:	patch

Files
File name	Uploaded	Description	Edit
17206.diff	ilblackdragon, 2013-03-18 06:36	Patch for 17206	review
perf.log	ilblackdragon, 2013-03-18 06:47	Benchmark comparison for 17206 patch
perf-linux.log	ilblackdragon, 2013-03-21 02:40	Benchmark comparison for 17206 in Ubuntu + GCC
asmdiff_17206.zip	ilblackdragon, 2013-03-21 02:51	Preprocessed, object and disassembled files of ceval module with and without patch.
17206-2.diff	ilblackdragon, 2013-03-31 05:10	Updated patch for 17206	review
17206-3.diff	ilblackdragon, 2013-04-03 05:07		review
object.patch	jkloth, 2013-06-17 19:10	Patch reusing _py_tmp	review
object-2.patch	jkloth, 2013-06-24 17:20		review
project.diff	jkloth, 2013-07-02 15:28		review

Pull Requests
URL	Status	Linked	Edit
PR 30855		Mark.Shannon, 2022-01-30 17:51

Messages (48)
msg182107 - (view)	Author: Dave Malcolm (dmalcolm)	Date: 2013-02-14 16:07
When running my refcount static analyzer [1] on a large corpus of real-world C extension code for Python, I ran into the following construct in various places: Py_XDECREF(PyObject_CallObject(callable, args)); This is bogus code: Py_XDECREF expands its argument multiple times, so after this goes through the C preprocessor the callable is actually called invoked up to 4 times, leaking refs to 3 of the results - assuming that none of the calls fail, and a non pydebug build of CPython (which would expand the argument more times). The raw preprocessor output for an optimized Python 2.7 looks like this: do { if ((PyObject_CallObject(callable, args)) == ((void )0)) ; else do { if ( --((PyObject)(PyObject_CallObject(callable, args)))->ob_refcnt != 0) ; else ( ((((PyObject)((PyObject )(PyObject_CallObject(callable, args))))->ob_type)->tp_dealloc)((PyObject )((PyObject )(PyObject_CallObject(callable, args))))); } while (0); } while (0); Cleaned up (and removing some of the casts for clarity) it looks like this: do { if ((PyObject_CallObject(callable, args)) == ((void )0)) ; else do { if (--(PyObject_CallObject(callable, args)->ob_refcnt) != 0) ; else ((PyObject_CallObject(callable, args)->ob_type)->tp_dealloc) PyObject_CallObject(callable, args); } while (0); } while (0); which is clearly not what the extension author was expecting. Should we: (A) update the macro so that it expands its argument only once, or (B) warn people not to write code like the above? Similar considerations apply to the other macros, I guess, but I've seen the above used "in the wild", I haven't yet seen it for the other macros). Seen in the wild in: python-alsa-1.0.25-1.fc17: pyalsa/alsamixer.c:179 00174 \| for (i = 0; i < count; i++) { 00175 \| t = PyTuple_New(2); 00176 \| if (t) { 00177 \| PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); 00178 \| PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 00179>\| Py_XDECREF(PyObject_CallObject(reg, t)); 00180 \| Py_DECREF(t); 00181 \| } 00182 \| } 00183 \| 00184 \| Py_XDECREF(reg); pyalsa/alsahcontrol.c:190 00185 \| for (i = 0; i < count; i++) { 00186 \| t = PyTuple_New(2); 00187 \| if (t) { 00188 \| PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); 00189 \| PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 00190>\| Py_XDECREF(PyObject_CallObject(reg, t)); 00191 \| Py_DECREF(t); 00192 \| } 00193 \| } 00194 \| 00195 \| Py_XDECREF(reg); pyalsa/alsaseq.c:3277 03272 \| for (i = 0; i < count; i++) { 03273 \| t = PyTuple_New(2); 03274 \| if (t) { 03275 \| PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); 03276 \| PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 03277>\| Py_XDECREF(PyObject_CallObject(reg, t)); 03278 \| Py_DECREF(t); 03279 \| } 03280 \| } 03281 \| 03282 \| Py_XDECREF(reg); python-4Suite-XML-1.0.2-14.fc17: Ft/Xml/src/domlette/refcounts.c:80 00075 \| / refcount = this */ 00076 \| expected = 1; 00077 \| } 00078 \| else { 00079 \| sprintf(buf, "Unexpected object type '%.200s'" ,node->ob_type->tp_name); 00080>\| Py_XDECREF(PyObject_CallMethod(tester, "error", "s", buf)); 00081 \| return 0; 00082 \| } 00083 \| 00084 \| sprintf(buf, "%.200s refcounts", node->ob_type->tp_name); 00085 \| return do_test(tester, buf, expected, node->ob_refcnt); [Note to self: I actually saw this because it uncovered a traceback in cpychecker, which I fixed as: http://git.fedorahosted.org/cgit/gcc-python-plugin.git/commit/?id=99fa820487e380b74c2eda1d97facdf2ee6d2f3a ] [1] https://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html
msg182111 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2013-02-14 16:32
On Thu, Feb 14, 2013 at 11:07 AM, Dave Malcolm <report@bugs.python.org>wrote: > > New submission from Dave Malcolm: > > When running my refcount static analyzer [1] on a large corpus of > real-world C extension code for Python, I ran into the following construct > in various places: > > Py_XDECREF(PyObject_CallObject(callable, args)); > Eww. > > This is bogus code: Py_XDECREF expands its argument multiple times, so > after this goes through the C preprocessor the callable is actually called > invoked up to 4 times, leaking refs to 3 of the results - assuming that > none of the calls fail, and a non pydebug build of CPython (which would > expand the argument more times). > > The raw preprocessor output for an optimized Python 2.7 looks like this: > do { if ((PyObject_CallObject(callable, args)) == ((void )0)) ; else > do { if ( --((PyObject)(PyObject_CallObject(callable, args)))->ob_refcnt > != 0) ; else ( ((((PyObject)((PyObject )(PyObject_CallObject(callable, > args))))->ob_type)->tp_dealloc)((PyObject )((PyObject > )(PyObject_CallObject(callable, args))))); } while (0); } while (0); > > Cleaned up (and removing some of the casts for clarity) it looks like this: > do { > if ((PyObject_CallObject(callable, args)) == ((void )0)) > ; > else > do { > if (--(PyObject_CallObject(callable, args)->ob_refcnt) != 0) > ; > else > ((PyObject_CallObject(callable, args)->ob_type)->tp_dealloc) > PyObject_CallObject(callable, args); > } while (0); > } while (0); > which is clearly not what the extension author was expecting. > > Should we: > (A) update the macro so that it expands its argument only once, or > How expensive is that going to be? Since there is a 'do' statement using a temp variable would be easy to introduce, but is a compiler going to be smart enough to inline it all so it doesn't have to waste an allocation, etc.? > (B) warn people not to write code like the above? > Only if (A) is too costly. I would modify a checkout for ALL refcount macros and run the benchmark suite to see if there is a noticeable difference. If there isn't then I say change all the macros (can't make one safe and the rest not as that is just asking for inconsistent usage and trouble). -Brett > > Similar considerations apply to the other macros, I guess, but I've seen > the above used "in the wild", I haven't yet seen it for the other macros). > > Seen in the wild in: > python-alsa-1.0.25-1.fc17: > pyalsa/alsamixer.c:179 > 00174 \| for (i = 0; i < count; i++) { > 00175 \| t = PyTuple_New(2); > 00176 \| if (t) { > 00177 \| PyTuple_SET_ITEM(t, 0, > PyInt_FromLong(pfd[i].fd)); > 00178 \| PyTuple_SET_ITEM(t, 1, > PyInt_FromLong(pfd[i].events)); > 00179>\| Py_XDECREF(PyObject_CallObject(reg, t)); > 00180 \| Py_DECREF(t); > 00181 \| } > 00182 \| } > 00183 \| > 00184 \| Py_XDECREF(reg); > > pyalsa/alsahcontrol.c:190 > 00185 \| for (i = 0; i < count; i++) { > 00186 \| t = PyTuple_New(2); > 00187 \| if (t) { > 00188 \| PyTuple_SET_ITEM(t, 0, > PyInt_FromLong(pfd[i].fd)); > 00189 \| PyTuple_SET_ITEM(t, 1, > PyInt_FromLong(pfd[i].events)); > 00190>\| Py_XDECREF(PyObject_CallObject(reg, t)); > 00191 \| Py_DECREF(t); > 00192 \| } > 00193 \| } > 00194 \| > 00195 \| Py_XDECREF(reg); > > pyalsa/alsaseq.c:3277 > 03272 \| for (i = 0; i < count; i++) { > 03273 \| t = PyTuple_New(2); > 03274 \| if (t) { > 03275 \| PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); > 03276 \| PyTuple_SET_ITEM(t, 1, > PyInt_FromLong(pfd[i].events)); > 03277>\| Py_XDECREF(PyObject_CallObject(reg, t)); > 03278 \| Py_DECREF(t); > 03279 \| } > 03280 \| } > 03281 \| > 03282 \| Py_XDECREF(reg); > > python-4Suite-XML-1.0.2-14.fc17: > Ft/Xml/src/domlette/refcounts.c:80 > 00075 \| / refcount = this */ > 00076 \| expected = 1; > 00077 \| } > 00078 \| else { > 00079 \| sprintf(buf, "Unexpected object type '%.200s'" > ,node->ob_type->tp_name); > 00080>\| Py_XDECREF(PyObject_CallMethod(tester, "error", "s", buf)); > 00081 \| return 0; > 00082 \| } > 00083 \| > 00084 \| sprintf(buf, "%.200s refcounts", node->ob_type->tp_name); > 00085 \| return do_test(tester, buf, expected, node->ob_refcnt); > > > [Note to self: I actually saw this because it uncovered a traceback in > cpychecker, which I fixed as: > > http://git.fedorahosted.org/cgit/gcc-python-plugin.git/commit/?id=99fa820487e380b74c2eda1d97facdf2ee6d2f3a] > > > [1] https://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html > > ---------- > components: Interpreter Core > messages: 182107 > nosy: dmalcolm > priority: normal > severity: normal > status: open > title: Py_XDECREF() expands its argument multiple times > versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4, > Python 3.5 > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue17206> > _______________________________________ > _______________________________________________ > New-bugs-announce mailing list > New-bugs-announce@python.org > http://mail.python.org/mailman/listinfo/new-bugs-announce >
msg182209 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2013-02-16 04:12
I wish we could use inline functions...
msg184430 - (view)	Author: Illia Polosukhin (ilblackdragon) *	Date: 2013-03-18 06:36
I've worked on this with Dave Malcolm @PyCon2013 sprints. This patch is work in progress to make Py_XDECREF() and Py_XINCREF() expands their arguments once instead of multiple times. Because patch is work in progress, it contains old version for ease of benchmarking. To enable the old version - define OLD_17206. Tested so far only with: Clang on Mac OS X 10.7 x64.
msg184431 - (view)	Author: Illia Polosukhin (ilblackdragon) *	Date: 2013-03-18 06:46
Additionally, in macros Py_XINCREF and Py_XDECREF we've took an opportunity to increase readability by changing expression: > if (item == NULL) ; else action(item); to more natural inverted condition: > if (item != NULL) action(item); There is a chance that there may be a reason for this form of expression, so let me know if this is an issue.
msg184432 - (view)	Author: Illia Polosukhin (ilblackdragon) *	Date: 2013-03-18 06:47
Benchmark run on Clang Mac OS X 10.7 attached of comparison with and without patch 17206.diff.
msg184433 - (view)	Author: Illia Polosukhin (ilblackdragon) *	Date: 2013-03-18 06:49
Command used for benchmarking was: python perf.py -b 2n3 -f ../cpython/baseline-clang/python.exe ../cpython/experiment-clang/python.exe \| tee perf.log
msg184488 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-03-18 18:43
I'd suggest less generic names for the temporary variables, just to minimise the chance of accidental name collisions. _py_xincref_tmp and _py_xdecref_tmp, perhaps?
msg184492 - (view)	Author: Illia Polosukhin (ilblackdragon) *	Date: 2013-03-18 18:59
That names were my first idea - but then I saw Py_CLEAR uses _py_tmp variable I used it. Should I replace in Py_CLEAR to _py_clear_tmp as well?
msg184495 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-03-18 19:15
Ignore me; I don't know what I was thinking. _py_tmp is fine. There's no issue with name clashes.
msg184833 - (view)	Author: Illia Polosukhin (ilblackdragon) *	Date: 2013-03-21 02:40
Compiled baseline and patched version with GCC in Ubuntu 12.10 (running in VMWare). Benchmarking results are attached: ./perf.py -b 2n3 -f ../cpython/baseline/python ../cpython/experiment/python \| tee perf-linux.log
msg184834 - (view)	Author: Illia Polosukhin (ilblackdragon) *	Date: 2013-03-21 02:51
Additionally, disassembled ceval.o and compared baseline with experiment (with applied patch): no actual differences found. Attached archive contains ceval from both baseline and experiment builds: - cevalb.cc - processed file of baseline build - ceval.cc - processed file of experiment build diff ceval.cc cevalb.cc - shows differences introduced by patch. - cevalb.o - object file of baseline build - ceval.o - object file of experiment build - cevalb.asm - disassembly of baseline object file - ceval.asm - disassembly of experiment object file diff ceval.asm cevalb.asm - doesn't show any differences in generated assembly code.
msg184855 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2013-03-21 07:44
- The patch contains a "#ifdef OLD_17206" that should be removed. - I know that these macros are already used everywhere, but a test for the new feature would be nice (in _testcapimodule.c) For example, Py_DECREF(PyLong_FromLong(0)) does not fail or leak when tests are run with -R.
msg185396 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-03-28 01:13
I'm not sure that it is a good idea to patch such major function (macro) in a minor version (ex: Python 2.7.x). I changed the Versions field to only select Python 3.4.
msg185614 - (view)	Author: Illia Polosukhin (ilblackdragon) *	Date: 2013-03-31 05:10
Updated patch - removed old code and addded tests to _testcapimoudule.c as Amaury suggested. Amaury, why do you mention -R flag (from what I see it does hash randomization)? I would expect some flag, that would enforce memory leakage issues to fail test.
msg185623 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-03-31 11:05
> Amaury, why do you mention -R flag He's talking about the -R argument to regrtest: """ -R runs each test several times and examines sys.gettotalrefcount() to see if the test appears to be leaking references. The argument should be of the form stab:run:fname where 'stab' is the number of times the test is run to let gettotalrefcount settle down, 'run' is the number of times further it is run and 'fname' is the name of the file the reports are written to. These parameters all have defaults (5, 4 and "reflog.txt" respectively), and the minimal invocation is '-R :'. """
msg185625 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2013-03-31 11:56
Thanks for the updated patch. Some comments: - In the tests, your functions should have return type PyObject* rather than void; you can use Py_RETURN_NONE as a convenient way to return something of the correct type. E.g.: static PyObject * test_decref_doesnt_leak(PyObject* self) { Py_DECREF(PyLong_FromLong(0)); Py_RETURN_NONE; } - Unfortunately, the test above will leak: even though your patch fixes Py_XDECREF, Py_DECREF will still evaluate the argument multiple times. So we get, for example: iwasawa:cpython mdickinson$ ./python.exe -m test -R:: test_capi [1/1] test_capi beginning 9 repetitions 123456789 ......... test_capi leaked [1, 1, 1, 1] references, sum=4 1 test failed: test_capi - It would be great to have tests for Py_INCREF and Py_XINCREF, too. - The patch introduces two extra blank lines before Py_CLEAR. - Not specifically about the patch, but: there's a comment in object.h that reads: """ * WARNING* The Py_DECREF macro must have a side-effect-free argument since it may evaluate its argument multiple times. (The alternative would be to mace it a proper function or assign it to a global temporary variable first, both of which are slower; and in a multi-threaded environment the global variable trick is not safe.) """ I don't understand why the author of that comment ruled out the possibility of a local temporary variable. Anyone have any ideas? A constraint of some flavours of pre-ANSI C, perhaps?
msg185627 - (view)	Author: Mark Shannon (Mark.Shannon) *	Date: 2013-03-31 14:25
Mark, it was written 20 years ago. Who knows (or cares) why it was written that way? Let's just write it correctly this time. Py_INCREF, Py_DECREF, Py_XDECREF and Py_XINCREF should all expand their argument exactly once. Py_CLEAR should expand its argument exactly twice, once as an rvalue, then once as an lvalue. So could we fix Py_DECREF and Py_CLEAR as well, please?
msg185643 - (view)	Author: Larry Hastings (larry) *	Date: 2013-03-31 19:29
For the record: I care. Generally speaking CPython is a lovingly crafted source tree, and the choices its architects made are nearly always sensible and well-reasoned. When I see things like this, things that seem kind of dumb on first glance, I worry that we're missing something. It could be that it did have a sensible reason fifteen years ago, like some bone-headed compiler generated awful code--and the macros are used everywhere, so maybe performance genuinely suffered. Or maybe the macros were originally intended to be usable as rvalues; it looks like Py_INCREF still permits that, and changing it to be a statement could be considered a breaking API change. But we can't let ourselves become hidebound and superstitious. I see no reason why a local temporary variable shouldn't be fine here, and perhaps in all the macros Mark cites. So don't get me wrong, I'm +1. (The documentation on Py_INCREF makes no mention of whether it's legal as an rvalue, and all the examples use it as a statement. Perhaps we should document it explicitly so for 3.4.)
msg185689 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2013-03-31 22:30
On 31.03.2013 21:29, Larry Hastings wrote: > > Larry Hastings added the comment: > > For the record: I care. Generally speaking CPython is a lovingly > crafted source tree, and the choices its architects made are nearly > always sensible and well-reasoned. When I see things like this, things > that seem kind of dumb on first glance, I worry that we're missing > something. Did someone test the patch on a use case where the performance of this macro matters a lot ? E.g. deallocating large lists or dictionaries. Todays CPUs have plenty of registers available, so it's possible that the patch actually speeds up things, since the decref'ed variable may well be an indirect reference. At the time Py_DECREF was written, registers were precious golden flasks, only to be filled with the most valuable of things ;-) Just want to be sure we're not missing something...
msg185889 - (view)	Author: Illia Polosukhin (ilblackdragon) *	Date: 2013-04-03 05:07
Fixed Py_DECREF and Py_CLEAR as well. Added tests for Py_INCREF and Py_XINCREF (if somebody has a better idea how to tests that INCREF doesn't leak - please, let me know). Removed comment that Py_DECREF evaluate it's argument multiple times as not relevant anymore. About considerations from performance point of view - I've made toy example (only this defines and main function) to test how gcc optimizer behaves in different cases - from what I see, if expression is like this (which is majority of cases in the code): PyObject* obj = Foo(); Py_XDECREF(obj) assembly code that will be produced (with -O3) is the same before and after patch.
msg186386 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2013-04-09 06:42
As I wrote in issue17589, there are some extension modules (pytables) that which assume that Py_INCREF is an expression: return Py_INCREF(x), x; and Py_RETURN_NONE is also defined with a comma expression. Oh, and Cython: #define __Pyx_PyBool_FromLong(b) ((b) ? (Py_INCREF(Py_True), Py_True) : (Py_INCREF(Py_False), Py_False))
msg189362 - (view)	Author: Illia Polosukhin (ilblackdragon) *	Date: 2013-05-16 12:14
Amaury, I didn't update Py_INCREF macro in this patch (because it doesn't expand it's argument multiple times) - so the examples you are showing will be working fine. I've updated Py_XINCREF, but it can't be used as an expression anyway.
msg189379 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2013-05-16 13:27
The last patch (17206-3.diff) has tests for the 4 macros, and looks good to me.
msg190156 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-05-27 21:46
New changeset aff41a6421c2 by Benjamin Peterson in branch 'default': don't expand the operand to Py_XINCREF/XDECREF/CLEAR/DECREF multiple times (closes #17206) http://hg.python.org/cpython/rev/aff41a6421c2
msg190623 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-06-04 22:29
"AMD64 Windows7 SP1 3.x" buildbot is crashing in test_marshal. It looks like a regression introduced by the changeset aff41a6421c2: [ 88/375] test_marshal Traceback (most recent call last): File "../lib/test/regrtest.py", line 1611, in <module> main_in_temp_cwd() File "../lib/test/regrtest.py", line 1586, in main_in_temp_cwd main() File "../lib/test/regrtest.py", line 774, in main raise Exception("Child error on {}: {}".format(test, result[1])) Exception: Child error on test_marshal: Exit code 3221225725 http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/2009/steps/test/logs/stdio
msg190628 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2013-06-04 22:48
Hum, a stack overflow?
msg191364 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2013-06-17 19:10
Indeed, after debugging, it is a stack overflow caused by the introduction of the different temporary variables in the Py_XINCREF, Py_XDECREF, and Py_DECREF macros. I've attached an updated patch that reuses the _py_tmp variable for those temporary assignments thus keeping the required stack size down.
msg191731 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-06-23 21:01
> I've attached an updated patch that reuses the _py_tmp variable for those temporary assignments thus keeping the required stack size down. I don't understand how it would change the size of the C stack, could you please explain? object.patch looks like refactoring to simplify the work of the C preprocessor.
msg191752 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2013-06-24 11:19
At least in a debug build, the MSVC 64-bit compiler seems to allocate space for each unique variable declared in the function body. Therefore, by changing the temporary variables to be named identically, the amount of required space is minimized. The refactoring of Py_DECREF is needed to prevent an error for local variable defined before use when Py_DECREF is nested within another macro already defining _py_tmp (it would expand to roughly PyObject *_py_tmp = _py_tmp).
msg191780 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2013-06-24 16:04
We've "fixed" this debug-problem in other cases simply by skipping the test in question on debug builds.
msg191785 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2013-06-24 16:44
Except, in this case, it actually crashes the interpreter. I would hope to think that it isn't common practice to just hide crashers especially when there is a fix available.
msg191786 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2013-06-24 16:47
Could you please wrap your new macros in the do .. while boilerplate?
msg191791 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2013-06-24 17:20
FYI, however, the new macro is designed as an internal implementation detail along the lines of the other _Py_* macros. That is, just a tiny piece of a larger function to be used at your own risk. Either way, I've uploaded another version with the do { } while (0) wrapping.
msg191792 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2013-06-24 17:47
Yes, but in case someone else mindlessly starts using it elsewhere, let's spare them the macro bugz.
msg191877 - (view)	Author: Mark Shannon (Mark.Shannon) *	Date: 2013-06-25 19:15
I think blaming the crash in test_marshall http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/2009/steps/test/logs/stdio on the new macros (change aff41a6421c2) is correct. It may be the proximate cause, but that doesn't make it the underlying one. The interpreter should be more resilient against minor changes to the amount of C stack used per call. Can we get a full stack trace (for the C stack not the Python one) for the failure?
msg191878 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-06-25 19:25
New changeset c31bec42e411 by Victor Stinner in branch 'default': Issue #17206: test.regrtest and test.script_helper enable faulthandler module http://hg.python.org/cpython/rev/c31bec42e411
msg191879 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2013-06-25 19:44
I'm unsure as to how to get a stack trace from Visual Studio, but I can assure you that the changes introduced by the change aff41a6421c2 is the cause of the failure in test_marshal. In a nutshell, the overflow happens in a recursion on the marhsal.c:r_object() function for the test_marshal test of BugsTestCase.test_loads_recursion().
msg191880 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-06-25 19:54
New changeset 1a9367d5aabc by Victor Stinner in branch 'default': Issue #17206: Fix test_cmd_line and test_faulthandler for my previous change http://hg.python.org/cpython/rev/1a9367d5aabc
msg191881 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-06-25 19:58
> Can we get a full stack trace (for the C stack not the Python one) for the failure? Yes, using a debugger on Windows x64. I'm waiting for a renewal of my MSDN account to get Visual Studio (full version, not the Express version limited to 32-bit). Until this, I enabled faulthandler in subprocesses created by test.regrtest and test.script_helper because it may help this issue, but also others future crashes.
msg191883 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-06-25 20:12
"I enabled faulthandler in subprocesses created by test.regrtest and test.script_helper..." Oh, it doesn't help for this issue. On Windows, faulthandler is unable to dump the Python traceback on a stack overflow. On UNIX, it allocates a diffrent stack for its signal handler, and so the signal handler can run even on a stack overflow. The Windows exit code 0xC00000FD (3221225725) means "Stack overflow / exhaustion". [ 81/375] test_marshal Traceback (most recent call last): File "../lib/test/regrtest.py", line 1618, in <module> main_in_temp_cwd() File "../lib/test/regrtest.py", line 1593, in main_in_temp_cwd main() File "../lib/test/regrtest.py", line 775, in main raise Exception("Child error on {}: {}".format(test, result[1])) Exception: Child error on test_marshal: Exit code 3221225725
msg191888 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2013-06-25 22:00
Here are some results after a detailed investigation: - in debug, each variable declaration in a block is allocated on the stack - in debug, r_object() uses 1416 bytes of stack - in release, r_object() uses 512 bytes of stack - default stack size is 1MB - stack for python_d.exe has been already up'ed to 2100000 So it seems that the best option would be to increase the stack size used when linking (/STACK:). I would suggest increasing it to 3MB using /STACK:3145728.
msg191919 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2013-06-26 21:46
Here is some additional analysis of recursive functions in the 'pythoncore' MSVC project: Ratio Release Debug Filename:Function Name Stack Stack ----------------------------------------------- 1.000 32 32 _sre.asm:_validate_inner 1.159 352 408 ast.asm:set_context 1.167 48 56 compile.asm:stackdepth_walk 1.188 128 152 memoryobject.asm:copy_rec 1.250 160 200 ast.asm:num_stmts 1.308 104 136 firstsets.asm:calcfirstset 1.312 128 168 posixmodule.asm:win32_xstat_impl_w 1.400 40 56 node.asm:freechildren 1.667 72 120 memoryobject.asm:tolist_rec 1.750 32 56 listnode.asm:list1node 1.750 32 56 parsermodule.asm:parser_compare_nodes 1.750 32 56 parsermodule.asm:validate_factor 1.750 32 56 parsermodule.asm:validate_not_test 1.750 32 56 rotatingtree.asm:RotatingTree_Enum 1.750 32 56 typeobject.asm:solid_base 1.786 112 200 bytearrayobject.asm:bytearray_setslice 1.900 80 152 compile.asm:compiler_comprehension_generator 2.143 56 120 pythonrun.asm:print_exception_recursive 2.167 48 104 arraymodule.asm:array_ass_slice 2.250 32 72 ast.asm:validate_slice 2.250 32 72 getargs.asm:skipitem 2.250 32 72 node.asm:sizeofchildren 2.250 32 72 parsermodule.asm:validate_test 2.250 32 72 setobject.asm:set_issuperset 2.278 144 328 listobject.asm:list_ass_slice 2.417 96 232 arraymodule.asm:array_ass_subscr 2.429 56 136 longobject.asm:PyLong_AsLongLong 2.429 56 136 parsermodule.asm:node2tuple 2.429 56 136 typeobject.asm:mro_subclasses 2.500 80 200 bytearrayobject.asm:bytearray_ass_subscript 2.600 40 104 errors.asm:PyErr_GivenExceptionMatches 2.750 32 88 import.asm:update_code_filenames 2.750 32 88 setobject.asm:set_richcompare 2.750 32 88 symtable.asm:symtable_visit_slice 2.750 32 88 typeobject.asm:PyType_Modified 2.766 512 1416 marshal.asm:r_object 3.125 64 200 longobject.asm:long_rshift 3.250 32 104 compile.asm:compiler_with 3.250 32 104 setobject.asm:set_issubset 3.250 32 104 typeobject.asm:assign_version_tag 3.750 32 120 abstract.asm:abstract_issubclass 3.750 32 120 typeobject.asm:PyType_Ready 3.833 48 184 ast.asm:ast_for_expr 4.250 32 136 compile.asm:compiler_visit_expr 4.250 32 136 mathmodule.asm:factorial_partial_product 4.500 80 360 genobject.asm:gen_throw 4.692 312 1464 symtable.asm:symtable_visit_stmt 5.250 32 168 _collectionsmodule.asm:deque_extend 5.250 32 168 _collectionsmodule.asm:deque_extendleft 5.250 32 168 ast.asm:alias_for_import_name 5.750 32 184 typeobject.asm:merge_class_dict 6.250 32 200 abstract.asm:PyObject_IsInstance 6.250 32 200 abstract.asm:PyObject_IsSubclass 6.250 32 200 ast.asm:validate_expr 6.500 48 312 Python-ast.asm:obj2ast_slice 7.182 88 632 parsermodule.asm:build_node_children 7.250 32 232 errors.asm:PyErr_NormalizeException 9.167 48 440 symtable.asm:symtable_visit_expr 10.250 32 328 _json.asm:encoder_listencode_obj 10.344 256 2648 Python-ast.asm:obj2ast_expr 15.955 176 2808 Python-ast.asm:obj2ast_stmt 31.750 32 1016 Python-ast.asm:ast2obj_expr
msg192167 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-07-02 00:16
"So it seems that the best option would be to increase the stack size used when linking (/STACK:). I would suggest increasing it to 3MB using /STACK:3145728." Does Python allocate the whole stack (physical pages) at startup, or is it done on demand as on Linux?
msg192204 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2013-07-02 15:28
The /STACK:reserve[,commit] linker option uses 'reserve' as the upper limit of bytes to allow the executable to allocate. The optional 'commit' value (default of 4096) is the amount of physical memory allocated as needed. Therefore the reserve could be as large as needed to prevent stack overflows without impacting the actually memory footprint of the executable. Maybe we should consider increasing the reserve to match that of Linux (8MB)? From the list I posted above, the huge increase in stack use in Python-ast.c could lead to other overflows for certain expressions. Attached is the patch to the project file to increase the stack reserve to 4MB. The actual value doesn't impact the memory requirements of the debug build except in edge cases that would otherwise result in stack overflow exceptions.
msg192205 - (view)	Author: Jeremy Kloth (jkloth) *	Date: 2013-07-02 15:32
I forgot to note that with the increased stack reserve my previous patches to Py_DECREF() and related macros are no longer needed.
msg192206 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-07-02 15:45
Jeremy: "stack for python_d.exe has been already up'ed to 2100000" Indeed, but not recently (changeset 97361d917d22): << Issue 2286: bump up the stack size of the 64-bit debug python_d.exe to 2100000. The default value of 200000 causes a stack overflow at 1965 iterations of r_object() in marshal.c, 35 iterations before the 2000 limit enforced by MAX_MARSHAL_STACK_DEPTH. >> Oh it looks like it was the same issue: "Stack overflow exception caused by test_marshal on Windows x64". See issue #2286. Amaury wrote in this issue (in 2008, msg63540): "Can you try to raise the stack size on x64 builds? If 2Mb is enough for 32bit, 4Mb should be good in your case." Jeremy: "Therefore the reserve could be as large as needed to prevent stack overflows without impacting the actually memory footprint of the executable. Maybe we should consider increasing the reserve to match that of Linux (8MB)?" Ah ok, if the memory footprint is unchanged, yes please increase the arbitrary limit. I would suggest 8 MB.
msg192507 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-07-07 00:50
New changeset fcf079dece0d by Victor Stinner in branch 'default': Issue #17206: On Windows, increase the stack size from 2 MB to 4.2 MB to fix http://hg.python.org/cpython/rev/fcf079dece0d

History
Date	User	Action	Args
2022-04-11 14:57:41	admin	set	github: 61408
2022-01-30 17:51:28	Mark.Shannon	set	pull_requests: + pull_request29209
2016-09-25 15:52:53	SilentGhost	set	status: open -> closed type: behavior resolution: fixed
2013-07-07 00:50:07	python-dev	set	messages: + msg192507
2013-07-02 15:45:01	vstinner	set	messages: + msg192206
2013-07-02 15:32:57	jkloth	set	messages: + msg192205
2013-07-02 15:28:24	jkloth	set	files: + project.diff messages: + msg192204
2013-07-02 00:16:12	vstinner	set	messages: + msg192167
2013-06-26 21:46:11	jkloth	set	messages: + msg191919
2013-06-25 22:00:32	jkloth	set	messages: + msg191888
2013-06-25 20:12:08	vstinner	set	messages: + msg191883
2013-06-25 19:58:58	vstinner	set	messages: + msg191881
2013-06-25 19:54:49	python-dev	set	messages: + msg191880
2013-06-25 19:44:47	jkloth	set	messages: + msg191879
2013-06-25 19:25:13	python-dev	set	messages: + msg191878
2013-06-25 19:15:41	Mark.Shannon	set	messages: + msg191877
2013-06-24 17:47:48	benjamin.peterson	set	messages: + msg191792
2013-06-24 17:20:36	jkloth	set	files: + object-2.patch messages: + msg191791
2013-06-24 16:47:27	benjamin.peterson	set	messages: + msg191786
2013-06-24 16:44:06	jkloth	set	messages: + msg191785
2013-06-24 16:04:30	benjamin.peterson	set	messages: + msg191780
2013-06-24 11:19:58	jkloth	set	messages: + msg191752
2013-06-23 21:01:18	vstinner	set	messages: + msg191731
2013-06-17 19:10:26	jkloth	set	files: + object.patch nosy: + jkloth messages: + msg191364
2013-06-04 22:48:45	amaury.forgeotdarc	set	messages: + msg190628
2013-06-04 22:29:43	vstinner	set	status: closed -> open resolution: fixed -> (no value) messages: + msg190623
2013-05-27 21:46:44	python-dev	set	status: open -> closed nosy: + python-dev messages: + msg190156 resolution: fixed stage: resolved
2013-05-16 13:27:01	amaury.forgeotdarc	set	messages: + msg189379
2013-05-16 12:54:27	Arfrever	set	nosy: + Arfrever
2013-05-16 12:14:51	ilblackdragon	set	messages: + msg189362
2013-04-09 06:42:04	amaury.forgeotdarc	set	messages: + msg186386
2013-04-07 10:11:54	asvetlov	set	nosy: + asvetlov
2013-04-03 05:07:35	ilblackdragon	set	files: + 17206-3.diff messages: + msg185889
2013-03-31 22:30:58	lemburg	set	nosy: + lemburg messages: + msg185689
2013-03-31 19:29:49	larry	set	messages: + msg185643
2013-03-31 14:25:17	Mark.Shannon	set	nosy: + Mark.Shannon messages: + msg185627
2013-03-31 11:56:21	mark.dickinson	set	messages: + msg185625
2013-03-31 11:05:53	mark.dickinson	set	messages: + msg185623
2013-03-31 05:10:53	ilblackdragon	set	files: + 17206-2.diff messages: + msg185614
2013-03-28 01:13:50	vstinner	set	nosy: + vstinner messages: + msg185396 versions: - Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.5
2013-03-21 07:44:16	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg184855
2013-03-21 02:51:17	ilblackdragon	set	files: + asmdiff_17206.zip messages: + msg184834
2013-03-21 02:40:23	ilblackdragon	set	files: + perf-linux.log messages: + msg184833
2013-03-18 19:15:32	mark.dickinson	set	messages: + msg184495
2013-03-18 18:59:53	ilblackdragon	set	messages: + msg184492
2013-03-18 18:43:55	mark.dickinson	set	nosy: + mark.dickinson messages: + msg184488
2013-03-18 06:49:52	ilblackdragon	set	messages: + msg184433
2013-03-18 06:47:28	ilblackdragon	set	files: + perf.log messages: + msg184432
2013-03-18 06:46:04	ilblackdragon	set	nosy: + larry messages: + msg184431
2013-03-18 06:36:49	ilblackdragon	set	files: + 17206.diff nosy: + ilblackdragon messages: + msg184430 keywords: + patch
2013-02-16 04:12:29	benjamin.peterson	set	nosy: + benjamin.peterson messages: + msg182209
2013-02-16 01:40:43	jcea	set	nosy: + jcea
2013-02-14 16:32:30	brett.cannon	set	nosy: + brett.cannon messages: + msg182111
2013-02-14 16:07:37	dmalcolm	create