classification
Title: Py_XDECREF() expands its argument multiple times
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, Mark.Shannon, amaury.forgeotdarc, asvetlov, benjamin.peterson, brett.cannon, dmalcolm, haypo, ilblackdragon, jcea, jkloth, larry, lemburg, mark.dickinson, python-dev
Priority: normal Keywords: patch

Created on 2013-02-14 16:07 by dmalcolm, last changed 2016-09-25 15:52 by SilentGhost. This issue is now closed.

Files
File name Uploaded Description Edit
17206.diff ilblackdragon, 2013-03-18 06:36 Patch for 17206 review
perf.log ilblackdragon, 2013-03-18 06:47 Benchmark comparison for 17206 patch
perf-linux.log ilblackdragon, 2013-03-21 02:40 Benchmark comparison for 17206 in Ubuntu + GCC
asmdiff_17206.zip ilblackdragon, 2013-03-21 02:51 Preprocessed, object and disassembled files of ceval module with and without patch.
17206-2.diff ilblackdragon, 2013-03-31 05:10 Updated patch for 17206 review
17206-3.diff ilblackdragon, 2013-04-03 05:07 review
object.patch jkloth, 2013-06-17 19:10 Patch reusing _py_tmp review
object-2.patch jkloth, 2013-06-24 17:20 review
project.diff jkloth, 2013-07-02 15:28 review
Messages (48)
msg182107 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2013-02-14 16:07
When running my refcount static analyzer [1] on a large corpus of real-world C extension code for Python, I ran into the following construct in various places:

    Py_XDECREF(PyObject_CallObject(callable, args));

This is bogus code: Py_XDECREF expands its argument multiple times, so after this goes through the C preprocessor the callable is actually called invoked up to 4 times, leaking refs to 3 of the results - assuming that none of the calls fail, and a non pydebug build of CPython (which would expand the argument more times).

The raw preprocessor output for an optimized Python 2.7 looks like this:
   do { if ((PyObject_CallObject(callable, args)) == ((void *)0)) ; else do { if ( --((PyObject*)(PyObject_CallObject(callable, args)))->ob_refcnt != 0) ; else ( (*(((PyObject*)((PyObject *)(PyObject_CallObject(callable, args))))->ob_type)->tp_dealloc)((PyObject *)((PyObject *)(PyObject_CallObject(callable, args))))); } while (0); } while (0);

Cleaned up (and removing some of the casts for clarity) it looks like this:
  do { 
    if ((PyObject_CallObject(callable, args)) == ((void *)0))
      ; 
    else 
      do {
        if (--(PyObject_CallObject(callable, args)->ob_refcnt) != 0)
          ;
        else
          (*(PyObject_CallObject(callable, args)->ob_type)->tp_dealloc)
            PyObject_CallObject(callable, args);
      } while (0);
  } while (0);
which is clearly not what the extension author was expecting.

Should we:
  (A) update the macro so that it expands its argument only once, or
  (B) warn people not to write code like the above?

Similar considerations apply to the other macros, I guess, but I've seen the above used "in the wild", I haven't yet seen it for the other macros).

Seen in the wild in:
  python-alsa-1.0.25-1.fc17:
      pyalsa/alsamixer.c:179
    00174 | 	for (i = 0; i < count; i++) { 
    00175 | 		t = PyTuple_New(2); 
    00176 | 		if (t) { 
    00177 | 			PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); 
    00178 | 			PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 
    00179>| 			Py_XDECREF(PyObject_CallObject(reg, t)); 
    00180 | 			Py_DECREF(t); 
    00181 | 		} 
    00182 | 	}	 
    00183 |  
    00184 | 	Py_XDECREF(reg); 

      pyalsa/alsahcontrol.c:190
    00185 | 	for (i = 0; i < count; i++) { 
    00186 | 		t = PyTuple_New(2); 
    00187 | 		if (t) { 
    00188 | 			PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); 
    00189 | 			PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 
    00190>| 			Py_XDECREF(PyObject_CallObject(reg, t)); 
    00191 | 			Py_DECREF(t); 
    00192 | 		} 
    00193 | 	}	 
    00194 |  
    00195 | 	Py_XDECREF(reg); 

      pyalsa/alsaseq.c:3277
    03272 |     for (i = 0; i < count; i++) { 
    03273 |         t = PyTuple_New(2); 
    03274 |         if (t) { 
    03275 |             PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); 
    03276 |             PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 
    03277>|             Py_XDECREF(PyObject_CallObject(reg, t)); 
    03278 |             Py_DECREF(t); 
    03279 |         } 
    03280 |     } 
    03281 |  
    03282 |     Py_XDECREF(reg); 

 python-4Suite-XML-1.0.2-14.fc17:
      Ft/Xml/src/domlette/refcounts.c:80
    00075 |     /* refcount = this */ 
    00076 |     expected = 1; 
    00077 |   } 
    00078 |   else { 
    00079 |     sprintf(buf, "Unexpected object type '%.200s'" ,node->ob_type->tp_name); 
    00080>|     Py_XDECREF(PyObject_CallMethod(tester, "error", "s", buf)); 
    00081 |     return 0; 
    00082 |   } 
    00083 |  
    00084 |   sprintf(buf, "%.200s refcounts", node->ob_type->tp_name); 
    00085 |   return do_test(tester, buf, expected, node->ob_refcnt); 


[Note to self: I actually saw this because it uncovered a traceback in cpychecker, which I fixed as:
http://git.fedorahosted.org/cgit/gcc-python-plugin.git/commit/?id=99fa820487e380b74c2eda1d97facdf2ee6d2f3a ]


[1] https://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html
msg182111 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2013-02-14 16:32
On Thu, Feb 14, 2013 at 11:07 AM, Dave Malcolm <report@bugs.python.org>wrote:

>
> New submission from Dave Malcolm:
>
> When running my refcount static analyzer [1] on a large corpus of
> real-world C extension code for Python, I ran into the following construct
> in various places:
>
>     Py_XDECREF(PyObject_CallObject(callable, args));
>

Eww.

>
> This is bogus code: Py_XDECREF expands its argument multiple times, so
> after this goes through the C preprocessor the callable is actually called
> invoked up to 4 times, leaking refs to 3 of the results - assuming that
> none of the calls fail, and a non pydebug build of CPython (which would
> expand the argument more times).
>
> The raw preprocessor output for an optimized Python 2.7 looks like this:
>    do { if ((PyObject_CallObject(callable, args)) == ((void *)0)) ; else
> do { if ( --((PyObject*)(PyObject_CallObject(callable, args)))->ob_refcnt
> != 0) ; else ( (*(((PyObject*)((PyObject *)(PyObject_CallObject(callable,
> args))))->ob_type)->tp_dealloc)((PyObject *)((PyObject
> *)(PyObject_CallObject(callable, args))))); } while (0); } while (0);
>
> Cleaned up (and removing some of the casts for clarity) it looks like this:
>   do {
>     if ((PyObject_CallObject(callable, args)) == ((void *)0))
>       ;
>     else
>       do {
>         if (--(PyObject_CallObject(callable, args)->ob_refcnt) != 0)
>           ;
>         else
>           (*(PyObject_CallObject(callable, args)->ob_type)->tp_dealloc)
>             PyObject_CallObject(callable, args);
>       } while (0);
>   } while (0);
> which is clearly not what the extension author was expecting.
>
> Should we:
>   (A) update the macro so that it expands its argument only once, or
>

How expensive is that going to be? Since there is a 'do' statement using a
temp variable would be easy to introduce, but is a compiler going to be
smart enough to inline it all so it doesn't have to waste an allocation,
etc.?

>   (B) warn people not to write code like the above?
>

Only if (A) is too costly. I would modify a checkout for ALL refcount
macros and run the benchmark suite to see if there is a noticeable
difference. If there isn't then I say change all the macros (can't make one
safe and the rest not as that is just asking for inconsistent usage and
trouble).

-Brett

>
> Similar considerations apply to the other macros, I guess, but I've seen
> the above used "in the wild", I haven't yet seen it for the other macros).
>
> Seen in the wild in:
>   python-alsa-1.0.25-1.fc17:
>       pyalsa/alsamixer.c:179
>     00174 |     for (i = 0; i < count; i++) {
>     00175 |             t = PyTuple_New(2);
>     00176 |             if (t) {
>     00177 |                     PyTuple_SET_ITEM(t, 0,
> PyInt_FromLong(pfd[i].fd));
>     00178 |                     PyTuple_SET_ITEM(t, 1,
> PyInt_FromLong(pfd[i].events));
>     00179>|                     Py_XDECREF(PyObject_CallObject(reg, t));
>     00180 |                     Py_DECREF(t);
>     00181 |             }
>     00182 |     }
>     00183 |
>     00184 |     Py_XDECREF(reg);
>
>       pyalsa/alsahcontrol.c:190
>     00185 |     for (i = 0; i < count; i++) {
>     00186 |             t = PyTuple_New(2);
>     00187 |             if (t) {
>     00188 |                     PyTuple_SET_ITEM(t, 0,
> PyInt_FromLong(pfd[i].fd));
>     00189 |                     PyTuple_SET_ITEM(t, 1,
> PyInt_FromLong(pfd[i].events));
>     00190>|                     Py_XDECREF(PyObject_CallObject(reg, t));
>     00191 |                     Py_DECREF(t);
>     00192 |             }
>     00193 |     }
>     00194 |
>     00195 |     Py_XDECREF(reg);
>
>       pyalsa/alsaseq.c:3277
>     03272 |     for (i = 0; i < count; i++) {
>     03273 |         t = PyTuple_New(2);
>     03274 |         if (t) {
>     03275 |             PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd));
>     03276 |             PyTuple_SET_ITEM(t, 1,
> PyInt_FromLong(pfd[i].events));
>     03277>|             Py_XDECREF(PyObject_CallObject(reg, t));
>     03278 |             Py_DECREF(t);
>     03279 |         }
>     03280 |     }
>     03281 |
>     03282 |     Py_XDECREF(reg);
>
>  python-4Suite-XML-1.0.2-14.fc17:
>       Ft/Xml/src/domlette/refcounts.c:80
>     00075 |     /* refcount = this */
>     00076 |     expected = 1;
>     00077 |   }
>     00078 |   else {
>     00079 |     sprintf(buf, "Unexpected object type '%.200s'"
> ,node->ob_type->tp_name);
>     00080>|     Py_XDECREF(PyObject_CallMethod(tester, "error", "s", buf));
>     00081 |     return 0;
>     00082 |   }
>     00083 |
>     00084 |   sprintf(buf, "%.200s refcounts", node->ob_type->tp_name);
>     00085 |   return do_test(tester, buf, expected, node->ob_refcnt);
>
>
> [Note to self: I actually saw this because it uncovered a traceback in
> cpychecker, which I fixed as:
>
> http://git.fedorahosted.org/cgit/gcc-python-plugin.git/commit/?id=99fa820487e380b74c2eda1d97facdf2ee6d2f3a]
>
>
> [1] https://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html
>
> ----------
> components: Interpreter Core
> messages: 182107
> nosy: dmalcolm
> priority: normal
> severity: normal
> status: open
> title: Py_XDECREF() expands its argument multiple times
> versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4,
> Python 3.5
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue17206>
> _______________________________________
> _______________________________________________
> New-bugs-announce mailing list
> New-bugs-announce@python.org
> http://mail.python.org/mailman/listinfo/new-bugs-announce
>
msg182209 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-02-16 04:12
I wish we could use inline functions...
msg184430 - (view) Author: Illia Polosukhin (ilblackdragon) * Date: 2013-03-18 06:36
I've worked on this with Dave Malcolm @PyCon2013 sprints.

This patch is work in progress to make Py_XDECREF() and Py_XINCREF() expands their arguments once instead of multiple times.

Because patch is work in progress, it contains old version for ease of benchmarking. To enable the old version - define OLD_17206.

Tested so far only with: Clang on Mac OS X 10.7 x64.
msg184431 - (view) Author: Illia Polosukhin (ilblackdragon) * Date: 2013-03-18 06:46
Additionally, in macros Py_XINCREF and Py_XDECREF we've took an opportunity to increase readability by changing expression:
> if (item == NULL) ; else action(item);
to more natural inverted condition:
> if (item != NULL) action(item);

There is a chance that there may be a reason for this form of expression, so let me know if this is an issue.
msg184432 - (view) Author: Illia Polosukhin (ilblackdragon) * Date: 2013-03-18 06:47
Benchmark run on Clang Mac OS X 10.7 attached of comparison with and without patch 17206.diff.
msg184433 - (view) Author: Illia Polosukhin (ilblackdragon) * Date: 2013-03-18 06:49
Command used for benchmarking was:

python perf.py -b 2n3 -f ../cpython/baseline-clang/python.exe ../cpython/experiment-clang/python.exe | tee perf.log
msg184488 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-03-18 18:43
I'd suggest less generic names for the temporary variables, just to minimise the chance of accidental name collisions.  _py_xincref_tmp and _py_xdecref_tmp, perhaps?
msg184492 - (view) Author: Illia Polosukhin (ilblackdragon) * Date: 2013-03-18 18:59
That names were my first idea - but then I saw Py_CLEAR uses _py_tmp variable I used it.

Should I replace in Py_CLEAR to _py_clear_tmp as well?
msg184495 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-03-18 19:15
Ignore me; I don't know what I was thinking.  _py_tmp is fine.  There's no  issue with name clashes.
msg184833 - (view) Author: Illia Polosukhin (ilblackdragon) * Date: 2013-03-21 02:40
Compiled baseline and patched version with GCC in Ubuntu 12.10 (running in VMWare).

Benchmarking results are attached:
./perf.py -b 2n3 -f ../cpython/baseline/python ../cpython/experiment/python | tee perf-linux.log
msg184834 - (view) Author: Illia Polosukhin (ilblackdragon) * Date: 2013-03-21 02:51
Additionally, disassembled ceval.o and compared baseline with experiment (with applied patch): no actual differences found.

Attached archive contains ceval from both baseline and experiment builds:
 - cevalb.cc - processed file of baseline build
 - ceval.cc - processed file of experiment build
diff ceval.cc cevalb.cc - shows differences introduced by patch.
 - cevalb.o - object file of baseline build
 - ceval.o - object file of experiment build
 - cevalb.asm - disassembly of baseline object file
 - ceval.asm - disassembly of experiment object file
diff ceval.asm cevalb.asm - doesn't show any differences in generated assembly code.
msg184855 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-03-21 07:44
- The patch contains a "#ifdef OLD_17206" that should be removed.

- I know that these macros are already used everywhere, but a test for the new feature would be nice (in _testcapimodule.c)
For example, Py_DECREF(PyLong_FromLong(0)) does not fail or leak when tests are run with -R.
msg185396 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-03-28 01:13
I'm not sure that it is a good idea to patch such major function (macro) in a minor version (ex: Python 2.7.x). I changed the Versions field to only select Python 3.4.
msg185614 - (view) Author: Illia Polosukhin (ilblackdragon) * Date: 2013-03-31 05:10
Updated patch - removed old code and addded tests to _testcapimoudule.c as Amaury suggested.

Amaury, why do you mention -R flag (from what I see it does hash randomization)? I would expect some flag, that would enforce memory leakage issues to fail test.
msg185623 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-03-31 11:05
> Amaury, why do you mention -R flag

He's talking about the -R argument to regrtest:

"""
-R runs each test several times and examines sys.gettotalrefcount() to
see if the test appears to be leaking references.  The argument should
be of the form stab:run:fname where 'stab' is the number of times the
test is run to let gettotalrefcount settle down, 'run' is the number
of times further it is run and 'fname' is the name of the file the
reports are written to.  These parameters all have defaults (5, 4 and
"reflog.txt" respectively), and the minimal invocation is '-R :'.
"""
msg185625 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-03-31 11:56
Thanks for the updated patch.  Some comments:

- In the tests, your functions should have return type PyObject* rather than void;  you can use Py_RETURN_NONE as a convenient way to return something of the correct type.  E.g.:

static PyObject *
test_decref_doesnt_leak(PyObject* self)
{
    Py_DECREF(PyLong_FromLong(0));
    Py_RETURN_NONE;
}

- Unfortunately, the test above *will* leak:  even though your patch fixes Py_XDECREF, Py_DECREF will still evaluate the argument multiple times.  So we get, for example:

iwasawa:cpython mdickinson$ ./python.exe -m test -R:: test_capi
[1/1] test_capi
beginning 9 repetitions
123456789
.........
test_capi leaked [1, 1, 1, 1] references, sum=4
1 test failed:
    test_capi

- It would be great to have tests for Py_INCREF and Py_XINCREF, too.

- The patch introduces two extra blank lines before Py_CLEAR.

- Not specifically about the patch, but: there's a comment in object.h that reads:

"""
*** WARNING*** The Py_DECREF macro must have a side-effect-free argument
since it may evaluate its argument multiple times.  (The alternative
would be to mace it a proper function or assign it to a global temporary
variable first, both of which are slower; and in a multi-threaded
environment the global variable trick is not safe.)
"""

I don't understand why the author of that comment ruled out the possibility of a local temporary variable.  Anyone have any ideas?  A constraint of some flavours of pre-ANSI C, perhaps?
msg185627 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2013-03-31 14:25
Mark, it was written 20 years ago. Who knows (or cares) why it was written that way?
Let's just write it correctly this time.


Py_INCREF, Py_DECREF, Py_XDECREF and Py_XINCREF should all expand their argument exactly once.

Py_CLEAR should expand its argument exactly twice, once as an rvalue, then once as an lvalue.

So could we fix Py_DECREF and Py_CLEAR as well, please?
msg185643 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2013-03-31 19:29
For the record: I care.  Generally speaking CPython is a lovingly
crafted source tree, and the choices its architects made are nearly
always sensible and well-reasoned.  When I see things like this, things
that seem kind of dumb on first glance, I worry that we're missing
something.

It could be that it did have a sensible reason fifteen years ago, like
some bone-headed compiler generated awful code--and the macros *are*
used everywhere, so maybe performance genuinely suffered.  Or maybe
the macros were originally intended to be usable as rvalues; it looks
like Py_INCREF still permits that, and changing it to be a statement
could be considered a breaking API change.

But we can't let ourselves become hidebound and superstitious.  I see
no reason why a local temporary variable shouldn't be fine here, and
perhaps in all the macros Mark cites.  So don't get me wrong, I'm +1.

(The documentation on Py_INCREF makes no mention of whether it's legal
as an rvalue, and all the examples use it as a statement.  Perhaps we
should document it explicitly so for 3.4.)
msg185689 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2013-03-31 22:30
On 31.03.2013 21:29, Larry Hastings wrote:
> 
> Larry Hastings added the comment:
> 
> For the record: I care.  Generally speaking CPython is a lovingly
> crafted source tree, and the choices its architects made are nearly
> always sensible and well-reasoned.  When I see things like this, things
> that seem kind of dumb on first glance, I worry that we're missing
> something.

Did someone test the patch on a use case where the performance of
this macro matters a lot ? E.g. deallocating large lists or
dictionaries.

Todays CPUs have plenty of registers available, so it's possible
that the patch actually speeds up things, since the decref'ed
variable may well be an indirect reference. At the time Py_DECREF
was written, registers were precious golden flasks, only to be
filled with the most valuable of things ;-)

Just want to be sure we're not missing something...
msg185889 - (view) Author: Illia Polosukhin (ilblackdragon) * Date: 2013-04-03 05:07
Fixed Py_DECREF and Py_CLEAR as well. 

Added tests for Py_INCREF and Py_XINCREF (if somebody has a better idea how to tests that INCREF doesn't leak - please, let me know).

Removed comment that Py_DECREF evaluate it's argument multiple times as not relevant anymore.

About considerations from performance point of view - I've made toy example (only this defines and main function) to test how gcc optimizer behaves in different cases - from what I see, if expression is like this (which is majority of cases in the code):
 PyObject* obj = Foo();
 Py_XDECREF(obj)

assembly code that will be produced (with -O3) is the same before and after patch.
msg186386 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-04-09 06:42
As I wrote in issue17589, there are some extension modules (pytables) that which assume that Py_INCREF is an expression:
  return Py_INCREF(x), x;
and Py_RETURN_NONE is also defined with a comma expression.

Oh, and Cython:
#define __Pyx_PyBool_FromLong(b) ((b) ? (Py_INCREF(Py_True), Py_True) : (Py_INCREF(Py_False), Py_False))
msg189362 - (view) Author: Illia Polosukhin (ilblackdragon) * Date: 2013-05-16 12:14
Amaury, 

   I didn't update Py_INCREF macro in this patch (because it doesn't expand it's argument multiple times) - so the examples you are showing will be working fine. 

   I've updated Py_XINCREF, but it can't be used as an expression anyway.
msg189379 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-05-16 13:27
The last patch (17206-3.diff) has tests for the 4 macros, and looks good to me.
msg190156 - (view) Author: Roundup Robot (python-dev) Date: 2013-05-27 21:46
New changeset aff41a6421c2 by Benjamin Peterson in branch 'default':
don't expand the operand to Py_XINCREF/XDECREF/CLEAR/DECREF multiple times (closes #17206)
http://hg.python.org/cpython/rev/aff41a6421c2
msg190623 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-04 22:29
"AMD64 Windows7 SP1 3.x" buildbot is crashing in test_marshal. It looks like a regression introduced by the changeset aff41a6421c2:

[ 88/375] test_marshal
Traceback (most recent call last):
  File "../lib/test/regrtest.py", line 1611, in <module>
    main_in_temp_cwd()
  File "../lib/test/regrtest.py", line 1586, in main_in_temp_cwd
    main()
  File "../lib/test/regrtest.py", line 774, in main
    raise Exception("Child error on {}: {}".format(test, result[1]))
Exception: Child error on test_marshal: Exit code 3221225725

http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/2009/steps/test/logs/stdio
msg190628 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-06-04 22:48
Hum, a stack overflow?
msg191364 - (view) Author: Jeremy Kloth (jkloth) * Date: 2013-06-17 19:10
Indeed, after debugging, it is a stack overflow caused by the introduction of the different temporary variables in the Py_XINCREF, Py_XDECREF, and Py_DECREF macros.

I've attached an updated patch that reuses the _py_tmp variable for those temporary assignments thus keeping the required stack size down.
msg191731 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-23 21:01
> I've attached an updated patch that reuses the _py_tmp variable for those temporary assignments thus keeping the required stack size down.

I don't understand how it would change the size of the C stack, could you please explain? object.patch looks like refactoring to simplify the work of the C preprocessor.
msg191752 - (view) Author: Jeremy Kloth (jkloth) * Date: 2013-06-24 11:19
At least in a debug build, the MSVC 64-bit compiler seems to allocate space for each unique variable declared in the function body.  Therefore, by changing the temporary variables to be named identically, the amount of required space is minimized.

The refactoring of Py_DECREF is needed to prevent an error for local variable defined before use when Py_DECREF is nested within another macro already defining _py_tmp (it would expand to roughly PyObject *_py_tmp = _py_tmp).
msg191780 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-06-24 16:04
We've "fixed" this debug-problem in other cases simply by skipping the test in question on debug builds.
msg191785 - (view) Author: Jeremy Kloth (jkloth) * Date: 2013-06-24 16:44
Except, in this case, it actually crashes the interpreter.  I would hope to think that it isn't common practice to just hide crashers especially when there is a fix available.
msg191786 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-06-24 16:47
Could you please wrap your new macros in the do .. while boilerplate?
msg191791 - (view) Author: Jeremy Kloth (jkloth) * Date: 2013-06-24 17:20
FYI, however, the new macro is designed as an internal implementation detail along the lines of the other _Py_* macros.  That is, just a tiny piece of a larger function to be used at your own risk.

Either way, I've uploaded another version with the do { } while (0) wrapping.
msg191792 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2013-06-24 17:47
Yes, but in case someone else mindlessly starts using it elsewhere, let's spare them the macro bugz.
msg191877 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2013-06-25 19:15
I think blaming the crash in test_marshall
http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/2009/steps/test/logs/stdio
on the new macros (change aff41a6421c2) is correct.

It may be the proximate cause, but that doesn't make it the underlying one. The interpreter should be more resilient against minor changes to the amount of C stack used per call.

Can we get a full stack trace (for the C stack not the Python one) for the failure?
msg191878 - (view) Author: Roundup Robot (python-dev) Date: 2013-06-25 19:25
New changeset c31bec42e411 by Victor Stinner in branch 'default':
Issue #17206: test.regrtest and test.script_helper enable faulthandler module
http://hg.python.org/cpython/rev/c31bec42e411
msg191879 - (view) Author: Jeremy Kloth (jkloth) * Date: 2013-06-25 19:44
I'm unsure as to how to get a stack trace from Visual Studio, but I can assure you that the changes introduced by the change aff41a6421c2 *is* the cause of the failure in test_marshal.

In a nutshell, the overflow happens in a recursion on the marhsal.c:r_object() function for the test_marshal test of BugsTestCase.test_loads_recursion().
msg191880 - (view) Author: Roundup Robot (python-dev) Date: 2013-06-25 19:54
New changeset 1a9367d5aabc by Victor Stinner in branch 'default':
Issue #17206: Fix test_cmd_line and test_faulthandler for my previous change
http://hg.python.org/cpython/rev/1a9367d5aabc
msg191881 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-25 19:58
> Can we get a full stack trace (for the C stack not the Python one) for the failure?

Yes, using a debugger on Windows x64. I'm waiting for a renewal of my MSDN account to get Visual Studio (full version, not the Express version limited to 32-bit).

Until this, I enabled faulthandler in subprocesses created by test.regrtest and test.script_helper because it may help this issue, but also others future crashes.
msg191883 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-25 20:12
"I enabled faulthandler in subprocesses created by test.regrtest and test.script_helper..."

Oh, it doesn't help for this issue. On Windows, faulthandler is unable to dump the Python traceback on a stack overflow. On UNIX, it allocates a diffrent stack for its signal handler, and so the signal handler can run even on a stack overflow.

The Windows exit code 0xC00000FD (3221225725) means "Stack overflow / exhaustion".

[ 81/375] test_marshal
Traceback (most recent call last):
  File "../lib/test/regrtest.py", line 1618, in <module>
    main_in_temp_cwd()
  File "../lib/test/regrtest.py", line 1593, in main_in_temp_cwd
    main()
  File "../lib/test/regrtest.py", line 775, in main
    raise Exception("Child error on {}: {}".format(test, result[1]))
Exception: Child error on test_marshal: Exit code 3221225725
msg191888 - (view) Author: Jeremy Kloth (jkloth) * Date: 2013-06-25 22:00
Here are some results after a detailed investigation:

- in debug, each variable declaration in a block is allocated on the stack
- in debug, r_object() uses 1416 bytes of stack 
- in release, r_object() uses 512 bytes of stack
- default stack size is 1MB
- stack for python_d.exe has been already up'ed to 2100000

So it seems that the best option would be to increase the stack size used when linking (/STACK:).  I would suggest increasing it to 3MB using /STACK:3145728.
msg191919 - (view) Author: Jeremy Kloth (jkloth) * Date: 2013-06-26 21:46
Here is some additional analysis of recursive functions in the 'pythoncore' MSVC project:

Ratio  Release  Debug   Filename:Function Name
       Stack    Stack
-----------------------------------------------
 1.000	 32	  32	_sre.asm:_validate_inner
 1.159	352	 408	ast.asm:set_context
 1.167	 48	  56	compile.asm:stackdepth_walk
 1.188	128	 152	memoryobject.asm:copy_rec
 1.250	160	 200	ast.asm:num_stmts
 1.308	104	 136	firstsets.asm:calcfirstset
 1.312	128	 168	posixmodule.asm:win32_xstat_impl_w
 1.400	 40	  56	node.asm:freechildren
 1.667	 72	 120	memoryobject.asm:tolist_rec
 1.750	 32	  56	listnode.asm:list1node
 1.750	 32	  56	parsermodule.asm:parser_compare_nodes
 1.750	 32	  56	parsermodule.asm:validate_factor
 1.750	 32	  56	parsermodule.asm:validate_not_test
 1.750	 32	  56	rotatingtree.asm:RotatingTree_Enum
 1.750	 32	  56	typeobject.asm:solid_base
 1.786	112	 200	bytearrayobject.asm:bytearray_setslice
 1.900	 80	 152	compile.asm:compiler_comprehension_generator
 2.143	 56	 120	pythonrun.asm:print_exception_recursive
 2.167	 48	 104	arraymodule.asm:array_ass_slice
 2.250	 32	  72	ast.asm:validate_slice
 2.250	 32	  72	getargs.asm:skipitem
 2.250	 32	  72	node.asm:sizeofchildren
 2.250	 32	  72	parsermodule.asm:validate_test
 2.250	 32	  72	setobject.asm:set_issuperset
 2.278	144	 328	listobject.asm:list_ass_slice
 2.417	 96	 232	arraymodule.asm:array_ass_subscr
 2.429	 56	 136	longobject.asm:PyLong_AsLongLong
 2.429	 56	 136	parsermodule.asm:node2tuple
 2.429	 56	 136	typeobject.asm:mro_subclasses
 2.500	 80	 200	bytearrayobject.asm:bytearray_ass_subscript
 2.600	 40	 104	errors.asm:PyErr_GivenExceptionMatches
 2.750	 32	  88	import.asm:update_code_filenames
 2.750	 32	  88	setobject.asm:set_richcompare
 2.750	 32	  88	symtable.asm:symtable_visit_slice
 2.750	 32	  88	typeobject.asm:PyType_Modified
 2.766	512	1416	marshal.asm:r_object
 3.125	 64	 200	longobject.asm:long_rshift
 3.250	 32	 104	compile.asm:compiler_with
 3.250	 32	 104	setobject.asm:set_issubset
 3.250	 32	 104	typeobject.asm:assign_version_tag
 3.750	 32	 120	abstract.asm:abstract_issubclass
 3.750	 32	 120	typeobject.asm:PyType_Ready
 3.833	 48	 184	ast.asm:ast_for_expr
 4.250	 32	 136	compile.asm:compiler_visit_expr
 4.250	 32	 136	mathmodule.asm:factorial_partial_product
 4.500	 80	 360	genobject.asm:gen_throw
 4.692	312	1464	symtable.asm:symtable_visit_stmt
 5.250	 32	 168	_collectionsmodule.asm:deque_extend
 5.250	 32	 168	_collectionsmodule.asm:deque_extendleft
 5.250	 32	 168	ast.asm:alias_for_import_name
 5.750	 32	 184	typeobject.asm:merge_class_dict
 6.250	 32	 200	abstract.asm:PyObject_IsInstance
 6.250	 32	 200	abstract.asm:PyObject_IsSubclass
 6.250	 32	 200	ast.asm:validate_expr
 6.500	 48	 312	Python-ast.asm:obj2ast_slice
 7.182	 88	 632	parsermodule.asm:build_node_children
 7.250	 32	 232	errors.asm:PyErr_NormalizeException
 9.167	 48	 440	symtable.asm:symtable_visit_expr
10.250	 32	 328	_json.asm:encoder_listencode_obj
10.344	256	2648	Python-ast.asm:obj2ast_expr
15.955	176	2808	Python-ast.asm:obj2ast_stmt
31.750	 32	1016	Python-ast.asm:ast2obj_expr
msg192167 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-07-02 00:16
"So it seems that the best option would be to increase the stack size used when linking (/STACK:).  I would suggest increasing it to 3MB using /STACK:3145728."

Does Python allocate the whole stack (physical pages) at startup, or is it done on demand as on Linux?
msg192204 - (view) Author: Jeremy Kloth (jkloth) * Date: 2013-07-02 15:28
The /STACK:reserve[,commit] linker option uses 'reserve' as the upper limit of bytes to allow the executable to allocate.  The optional 'commit' value (default of 4096) is the amount of physical memory allocated as needed.

Therefore the reserve could be as large as needed to prevent stack overflows without impacting the actually memory footprint of the executable.  Maybe we should consider increasing the reserve to match that of Linux (8MB)?  From the list I posted above, the huge increase in stack use in Python-ast.c could lead to other overflows for certain expressions.

Attached is the patch to the project file to increase the stack reserve to 4MB.  The actual value doesn't impact the memory requirements of the debug build except in edge cases that would otherwise result in stack overflow exceptions.
msg192205 - (view) Author: Jeremy Kloth (jkloth) * Date: 2013-07-02 15:32
I forgot to note that with the increased stack reserve my previous patches to Py_DECREF() and related macros are no longer needed.
msg192206 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-07-02 15:45
Jeremy: "stack for python_d.exe has been already up'ed to 2100000"

Indeed, but not recently (changeset 97361d917d22):

<< Issue 2286: bump up the stack size of the 64-bit debug python_d.exe to 2100000. The default value of 200000 causes a stack overflow at 1965 iterations of r_object() in marshal.c, 35 iterations before the 2000 limit enforced by MAX_MARSHAL_STACK_DEPTH. >>

Oh it looks like it was the same issue: "Stack overflow exception caused by test_marshal on Windows x64". See issue #2286.

Amaury wrote in this issue (in 2008, msg63540):
"Can you try to raise the stack size on x64 builds? If 2Mb is enough for 32bit, 4Mb should be good in your case."

Jeremy: "Therefore the reserve could be as large as needed to prevent stack overflows without impacting the actually memory footprint of the executable.  Maybe we should consider increasing the reserve to match that of Linux (8MB)?"

Ah ok, if the memory footprint is unchanged, yes please increase the arbitrary limit. I would suggest 8 MB.
msg192507 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-07 00:50
New changeset fcf079dece0d by Victor Stinner in branch 'default':
Issue #17206: On Windows, increase the stack size from 2 MB to 4.2 MB to fix
http://hg.python.org/cpython/rev/fcf079dece0d
History
Date User Action Args
2016-09-25 15:52:53SilentGhostsetstatus: open -> closed
type: behavior
resolution: fixed
2013-07-07 00:50:07python-devsetmessages: + msg192507
2013-07-02 15:45:01hayposetmessages: + msg192206
2013-07-02 15:32:57jklothsetmessages: + msg192205
2013-07-02 15:28:24jklothsetfiles: + project.diff

messages: + msg192204
2013-07-02 00:16:12hayposetmessages: + msg192167
2013-06-26 21:46:11jklothsetmessages: + msg191919
2013-06-25 22:00:32jklothsetmessages: + msg191888
2013-06-25 20:12:08hayposetmessages: + msg191883
2013-06-25 19:58:58hayposetmessages: + msg191881
2013-06-25 19:54:49python-devsetmessages: + msg191880
2013-06-25 19:44:47jklothsetmessages: + msg191879
2013-06-25 19:25:13python-devsetmessages: + msg191878
2013-06-25 19:15:41Mark.Shannonsetmessages: + msg191877
2013-06-24 17:47:48benjamin.petersonsetmessages: + msg191792
2013-06-24 17:20:36jklothsetfiles: + object-2.patch

messages: + msg191791
2013-06-24 16:47:27benjamin.petersonsetmessages: + msg191786
2013-06-24 16:44:06jklothsetmessages: + msg191785
2013-06-24 16:04:30benjamin.petersonsetmessages: + msg191780
2013-06-24 11:19:58jklothsetmessages: + msg191752
2013-06-23 21:01:18hayposetmessages: + msg191731
2013-06-17 19:10:26jklothsetfiles: + object.patch
nosy: + jkloth
messages: + msg191364

2013-06-04 22:48:45amaury.forgeotdarcsetmessages: + msg190628
2013-06-04 22:29:43hayposetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg190623
2013-05-27 21:46:44python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg190156

resolution: fixed
stage: resolved
2013-05-16 13:27:01amaury.forgeotdarcsetmessages: + msg189379
2013-05-16 12:54:27Arfreversetnosy: + Arfrever
2013-05-16 12:14:51ilblackdragonsetmessages: + msg189362
2013-04-09 06:42:04amaury.forgeotdarcsetmessages: + msg186386
2013-04-07 10:11:54asvetlovsetnosy: + asvetlov
2013-04-03 05:07:35ilblackdragonsetfiles: + 17206-3.diff

messages: + msg185889
2013-03-31 22:30:58lemburgsetnosy: + lemburg
messages: + msg185689
2013-03-31 19:29:49larrysetmessages: + msg185643
2013-03-31 14:25:17Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg185627
2013-03-31 11:56:21mark.dickinsonsetmessages: + msg185625
2013-03-31 11:05:53mark.dickinsonsetmessages: + msg185623
2013-03-31 05:10:53ilblackdragonsetfiles: + 17206-2.diff

messages: + msg185614
2013-03-28 01:13:50hayposetnosy: + haypo

messages: + msg185396
versions: - Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.5
2013-03-21 07:44:16amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg184855
2013-03-21 02:51:17ilblackdragonsetfiles: + asmdiff_17206.zip

messages: + msg184834
2013-03-21 02:40:23ilblackdragonsetfiles: + perf-linux.log

messages: + msg184833
2013-03-18 19:15:32mark.dickinsonsetmessages: + msg184495
2013-03-18 18:59:53ilblackdragonsetmessages: + msg184492
2013-03-18 18:43:55mark.dickinsonsetnosy: + mark.dickinson
messages: + msg184488
2013-03-18 06:49:52ilblackdragonsetmessages: + msg184433
2013-03-18 06:47:28ilblackdragonsetfiles: + perf.log

messages: + msg184432
2013-03-18 06:46:04ilblackdragonsetnosy: + larry
messages: + msg184431
2013-03-18 06:36:49ilblackdragonsetfiles: + 17206.diff

nosy: + ilblackdragon
messages: + msg184430

keywords: + patch
2013-02-16 04:12:29benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg182209
2013-02-16 01:40:43jceasetnosy: + jcea
2013-02-14 16:32:30brett.cannonsetnosy: + brett.cannon
messages: + msg182111
2013-02-14 16:07:37dmalcolmcreate