This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: _PyFunction_FastCallDict and _PyFunction_FastCallKeywords: fast path not used
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: anselm.kruis, vstinner
Priority: normal Keywords: patch

Created on 2017-10-21 11:14 by anselm.kruis, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 4087 merged vstinner, 2017-10-23 15:41
Messages (5)
msg304702 - (view) Author: Anselm Kruis (anselm.kruis) * Date: 2017-10-21 11:14
Just a minor performance issue.

The C functions _PyFunction_FastCallDict() and _PyFunction_FastCallKeywords() (branch 'master', Objects/call.c) and their predecessors fast_function() and _PyFunction_FastCallDict() in Python/ceval.c all contain the following sub-expression in the "if"-statement for the fast-path. For instance Objects/call.c:318

 co->co_flags == (CO_OPTIMIZED | CO_NEWLOCALS | CO_NOFREE)

Now, if co_flags has any of the CO_FUTURE_... bits set, the expression is always False and the fast path is not used.

Currently this affects only Python 3.6 and Python 2.7, because other Python versions do not use the __future__ mechanism.

The fix is simple. Replace the faulty sub-expression by

 (co->co_flags & (~PyCF_MASK)) == (CO_OPTIMIZED | CO_NEWLOCALS | CO_NOFREE))

I discovered this issue while debugging reference leaks in Stackless Python a few month ago. It is hard to write a test case, but one can compare C call stacks using a debugger.

$ ulimit -c unlimited  # enable core dumps
$ python3.6 -c 'from __future__ import generator_stop; import os; (lambda: os.abort())()'
$ gdb -batch -ex bt  python3.6 core > trace_with_future
$ python3.6 -c 'import os; (lambda: os.abort())()'
$ gdb -batch -ex bt  python3.6 core > trace_without_future

If you compare the traces, the difference is in stack frame #9. Same for python2.7.
msg304811 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-23 15:44
> The fix is simple. Replace the faulty sub-expression by
> (co->co_flags & (~PyCF_MASK)) == (CO_OPTIMIZED | CO_NEWLOCALS | CO_NOFREE))

I proposed PR 4087 to implement this optimization.

I wouldn't call it a "fix", since the "co->co_flags == (CO_OPTIMIZED | CO_NEWLOCALS | CO_NOFREE)" check exists since Python 2.7 at least (whereas Python 2.7 also has CO_FUTURE_xxx flags).

> Just a minor performance issue.

I prefer to call it a performance opportunity :-)
msg304812 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-23 15:45
I reset Versions to Python 3.7. I don't consider this issue as a bug, but only as a new optimization. So it can only go into the future Python 3.7.
msg304982 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-25 12:26
New changeset 086c3ae5f0995a62092b9080f32dd118c2923453 by Victor Stinner in branch 'master':
bpo-31835: Optimize also FASTCALL using __future__ (#4087)
https://github.com/python/cpython/commit/086c3ae5f0995a62092b9080f32dd118c2923453
msg304983 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-25 12:27
Thank you Anselm Kruis for spotting this nice optimization opportunity! Sadly, as I wrote, I don't want to backport the optimization to the stable Python 3.6 branch.
History
Date User Action Args
2022-04-11 14:58:53adminsetgithub: 76016
2017-10-25 12:27:25vstinnersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-10-25 12:27:17vstinnersetmessages: + msg304983
2017-10-25 12:26:19vstinnersetmessages: + msg304982
2017-10-23 15:45:21vstinnersetmessages: + msg304812
versions: - Python 2.7, Python 3.6
2017-10-23 15:44:40vstinnersetmessages: + msg304811
2017-10-23 15:41:26vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request4057
2017-10-22 16:03:15pitrousetnosy: + vstinner

versions: + Python 3.7
2017-10-21 11:14:04anselm.kruiscreate