classification
Title: Add _PyObject_FastCall()
Type: performance Stage:
Components: Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: python-dev, scoder, serhiy.storchaka, vstinner, yselivanov, ztane
Priority: normal Keywords: patch

Created on 2016-05-26 10:15 by vstinner, last changed 2016-09-01 13:13 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
fastcall.patch vstinner, 2016-05-26 10:15 review
default-May26-13-36-33.log vstinner, 2016-05-26 13:09
fastcall-2.patch vstinner, 2016-08-08 00:56 review
fast_call_alt.patch serhiy.storchaka, 2016-08-11 20:05 review
fastcall-3.patch vstinner, 2016-08-16 22:18 review
Messages (32)
msg266422 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-26 10:15
Since the issue #26814 proved that avoiding the creation of temporary tuples to call Python and C functions makes Python faster (between 2% and 29% depending on the benchmark), I extracted a first "minimal" patch to start merging this work.

The first patch adds new functions:

* PyObject_CallNoArg(func) and PyObject_CallArg1(func, arg): public functions
* _PyObject_FastCall(func, args, nargs, kwargs): private function

I hesitate between the C types "int" and "Py_ssize_t" for nargs. I read once that using "int" can cause performance issues on a loop using "i++" and "data[i]" because the compiler has to handle integer overflow of the int type.

The "int" type is also annoying on Windows 64-bit, it causes compiler warnings on downcast like PyTuple_GET_SIZE(co->co_argcount) stored into a C int.


_PyObject_FastCall() avoids the creation of tuple for:

* All Python functions (PyFunction_Check)
* C functions using METH_NOARGS or METH_O

The patch removes the "cache tuple" optimization from property_descr_get(), it uses PyObject_CallArg1() instead. It means that the optimization is (currently) missed in some cases compared to the current code, but the code is safer and simpler.


The patch adds Python/pystack.c which currently only contains _PyStack_AsTuple(), but will contain more code later.


I tried to write the smallest patch, but I started to use PyObject_CallNoArg() and PyObject_CallArg1() when the code already created a tuple at each call: PyObject_CallObject(), call_function_tail() and PyEval_CallObjectWithKeywords().


In the patch, keywords are not used in fast calls. But they will be used later. I prefer to start directly with keywords than changing the calling convention once again later.

--

Later, I will propose other patches to:

* add METH_FASTCALL calling convention for C functions
* modify Argument Clinic to use METH_FASTCALL

So the fast call will be taken in more cases.

--

The long term plan is to slowly use the new FASTCALL calling convention "everywhere". The tricky point are tp_new, tp_init and tp_call attributes of type objects. In the issue #26814, I wrote a patch adding Py_TPFLAGS_FASTNEW, Py_TPFLAGS_FASTINIT and Py_TPFLAGS_FASTCALL flags to use the FASTCALL calling convention for tp_new, tp_init and tp_call. The problem is that calling directly these methods looks common. If we can the calling convention of these methods, it will break the C API, I propose to discuss that later ;-)

An alternative is to add a tp_fastcall method to PyTypeObject and use a wrapper for tp_call for backward compatibility. This option has also drawbacks. Again, I propose to discuss this later, and first start to focus on the changes that don't break anything ;-)
msg266424 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-26 10:36
Quick & dirty microbenchmark: I ran bench_fast-2.py of the issue #26814. It looks like everything is slower :-p In fact, I already noticed this issue and I think that it is fixed with better compilation option: use "./configure --with-lto" and "make profile-opt". See my article:
https://haypo.github.io/journey-to-stable-benchmark-deadcode.html

----------------------------------+-------------+---------------
Tests                             |    original |       fastcall
----------------------------------+-------------+---------------
filter                            | 76.2 us (*) |  116 us (+52%)
map                               | 73.6 us (*) |  102 us (+38%)
sorted(list, key=lambda x: x)     |   82 us (*) |  121 us (+48%)
sorted(list)                      | 14.7 us (*) | 17.3 us (+18%)
b=MyBytes(); bytes(b)             |  182 ns (*) |  243 ns (+33%)
namedtuple.attr                   |  802 ns (*) | 1.44 us (+80%)
object.__setattr__(obj, "x", 1)   |  133 ns (*) |  166 ns (+25%)
object.__getattribute__(obj, "x") |  116 ns (*) |  142 ns (+22%)
getattr(1, "real")                |   76 ns (*) |   95 ns (+25%)
bounded_pymethod(1, 2)            |   72 ns (*) |  102 ns (+42%)
unbound_pymethod(obj, 1, 2)       |   71 ns (*) |   99 ns (+38%)
func()                            |   57 ns (*) |   81 ns (+41%)
func(1, 2, 3)                     |   72 ns (*) |  100 ns (+39%)
----------------------------------+-------------+---------------
Total                             |  248 us (*) |  358 us (+44%)
----------------------------------+-------------+---------------

At least, we have a starting point ;-)
msg266429 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-26 13:09
default-May26-13-36-33.log: CPython benchmark suite run using stable config.

Faster (15):
- regex_effbot: 1.26x faster
- telco: 1.08x faster
- unpack_sequence: 1.07x faster
- mako_v2: 1.05x faster
- meteor_contest: 1.05x faster
- chaos: 1.04x faster
- nbody: 1.04x faster
- call_method_slots: 1.04x faster
- etree_iterparse: 1.04x faster
- etree_parse: 1.04x faster
- call_method: 1.03x faster
- raytrace: 1.03x faster
- nqueens: 1.03x faster
- call_method_unknown: 1.03x faster
- formatted_logging: 1.02x faster

Slower (8):
- etree_generate: 1.05x slower
- etree_process: 1.03x slower
- call_simple: 1.03x slower
- chameleon_v2: 1.02x slower
- pathlib: 1.02x slower
- float: 1.02x slower
- silent_logging: 1.02x slower
- json_load: 1.02x slower
msg266430 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-26 13:10
Updated bench_fast-2.py result with Python compiled with PGO+LTO, with benchmark.py fixed to compute average + standard deviation. Only getattr() really seems slower:

----------------------------------+-----------------------+--------------------------
Tests                             |              original |                  fastcall
----------------------------------+-----------------------+--------------------------
filter                            | 75.8 us +- 0.1 us (*) |         78.1 us +- 0.1 us
map                               | 72.6 us +- 0.1 us (*) |         71.4 us +- 0.0 us
sorted(list, key=lambda x: x)     | 83.7 us +- 0.1 us (*) |         82.3 us +- 0.3 us
sorted(list)                      | 14.9 us +- 0.0 us (*) |         14.7 us +- 0.0 us
b=MyBytes(); bytes(b)             |    199 ns +- 2 ns (*) |            194 ns +- 1 ns
namedtuple.attr                   |   830 ns +- 20 ns (*) | 1.09 us +- 0.01 us (+31%)
object.__setattr__(obj, "x", 1)   |    133 ns +- 0 ns (*) |            134 ns +- 1 ns
object.__getattribute__(obj, "x") |    117 ns +- 0 ns (*) |            115 ns +- 1 ns
getattr(1, "real")                | 93.2 ns +- 0.9 ns (*) |  76.9 ns +- 0.7 ns (-17%)
bounded_pymethod(1, 2)            | 73.4 ns +- 0.6 ns (*) |         70.7 ns +- 0.4 ns
unbound_pymethod(obj, 1, 2)       | 74.5 ns +- 0.2 ns (*) |         71.8 ns +- 0.6 ns
func()                            | 60.2 ns +- 0.4 ns (*) |         59.3 ns +- 0.1 ns
func(1, 2, 3)                     | 74.6 ns +- 0.4 ns (*) |         72.2 ns +- 0.3 ns
----------------------------------+-----------------------+--------------------------
Total                             |            249 us (*) |                    248 us
----------------------------------+-----------------------+--------------------------
msg268057 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-09 20:23
See issue27213. Maybe fast call with keyword arguments would avoid the creation of a dict.
msg268059 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-09 21:26
Serhiy Storchaka added the comment:
> See issue27213. Maybe fast call with keyword arguments would avoid the creation of a dict.

In a first verison of my implementation, I used dictionary items
stored a a list of (key, value) tuples in the same PyObject* C array
than positional parameters.

But in practice, it's very rare in the C code base to have to call a
function with keyword parameters, but most functions expect keyword
parameters as a dict. They are implemented with
PyArg_ParseTupleAndKeywords() which expects a dict.
msg272138 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-08 00:50
Rebased patch.
msg272139 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-08 00:57
(Oops, I removed a broken fastcall-2.patch which didn't include new pystack.c/pystack.h files. It's now fixed in the new fastcall-2.patch.)
msg272197 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-08 22:01
I spent the last 3 months on making the CPython benchmark suite more stable and enhance my procedure to run benchmarks to ensure that benchmarks are more stable.

See my articles:
https://haypo-notes.readthedocs.io/microbenchmark.html#my-articles

I forked and enhanced the benchmark suite to use my perf module to run benchmarks in multiple processes:
https://hg.python.org/sandbox/benchmarks_perf

I ran this better benchmark suite on fastcall-2.patch on my laptop. The result is quite good: 
----------------
$ python3 -m perf compare_to ref.json fastcall.json -G  --min-speed=5
Slower (4):
- fastpickle/pickle_dict: 326 us +- 15 us -> 350 us +- 29 us: 1.07x slower
- regex_effbot: 49.4 ms +- 1.3 ms -> 53.0 ms +- 1.2 ms: 1.07x slower
- fastpickle/pickle: 432 us +- 8 us -> 457 us +- 10 us: 1.06x slower
- pybench.ComplexPythonFunctionCalls: 838 ns +- 11 ns -> 884 ns +- 8 ns: 1.05x slower

Faster (13):
- spectral_norm: 289 ms +- 6 ms -> 250 ms +- 5 ms: 1.16x faster
- pybench.SimpleIntFloatArithmetic: 622 ns +- 9 ns -> 559 ns +- 10 ns: 1.11x faster
- pybench.SimpleIntegerArithmetic: 621 ns +- 10 ns -> 560 ns +- 9 ns: 1.11x faster
- pybench.SimpleLongArithmetic: 891 ns +- 12 ns -> 816 ns +- 10 ns: 1.09x faster
- pybench.DictCreation: 852 ns +- 13 ns -> 788 ns +- 16 ns: 1.08x faster
- pybench.ForLoops: 10.8 ns +- 0.3 ns -> 9.99 ns +- 0.23 ns: 1.08x faster
- pybench.NormalClassAttribute: 1.85 us +- 0.02 us -> 1.72 us +- 0.04 us: 1.08x faster
- pybench.SpecialClassAttribute: 1.86 us +- 0.02 us -> 1.73 us +- 0.03 us: 1.07x faster
- pybench.NestedForLoops: 21.9 ns +- 0.3 ns -> 20.7 ns +- 0.3 ns: 1.05x faster
- pybench.SimpleListManipulation: 501 ns +- 4 ns -> 476 ns +- 5 ns: 1.05x faster
- elementtree/process: 192 ms +- 3 ms -> 183 ms +- 2 ms: 1.05x faster
- elementtree/generate: 225 ms +- 5 ms -> 214 ms +- 4 ms: 1.05x faster
- hexiom2/level_25: 21.3 ms +- 0.3 ms -> 20.3 ms +- 0.1 ms: 1.05x faster

Benchmark hidden because not significant (84): (...)
----------------

Most benchmarks are not significant which is expected since fastcall-2.patch is really the most simple patch to start the work on "FASTCALL", it doesn't really implement any optimization, it only adds a new infrastructure to implement new optimizations.

A few benchmarks are faster (only benchmarks at least 5% faster are shown using --min-speed=5).

4 benchmarks are slower, but the slowdown should be temporarily: new optimizations should these benchmarks slower. See the issue #26814 for more a concrete implementation and a lot of benchmark results if you don't trust me :-)

I consider that benchmarks proved that there is no major slowdown, so fastcall-2.patch can be merged to be able to start working on real optimizations.
msg272264 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-08-09 19:45
Benchmarking results look nice, but despite the fact that this patch is only small part of issue26814, it looks to me larger that it could be.

1. The patch includes two parts: adding _PyObject_FastCall() and adding PyObject_CallNoArg() and PyObject_CallArg1(). How large the role of latter functions in the speed up? Can we first just add _PyObject_FastCall() and measure the effect of adding PyObject_CallNoArg() and PyObject_CallArg1() separately? Can existing function PyObject_Call() be optimized to achieve a comparable benefit?

2. I think that supporting keyword arguments in _PyObject_FastCall() doesn't make much sense now. Calling with keyword arguments adds such much overhead, that it dwarfs the benefit of avoiding the creation of one tuple. I think that the patch can be simpler if drop the support of keyword arguments.

3. The patch adds two files for one function _PyStack_AsTuple(). I would prefer something like _PyTuple_FromArray(). It could be used in other places, not just in argument parsing.
msg272268 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-09 21:30
> Benchmarking results look nice, but despite the fact that this patch is
only small part of issue26814, it looks to me larger that it could be.

Oh I failed to express my intent. This initial patch is not expected to
introduce any speedup. In fact I noticed major performance regressions on
the CPython benchmark suite using my full fastcall patch. It took me time
to understand that they are more issues with benchmarks than my work. This
minimum patch only adds new functions but don't really use them. I patched
a few functions to show how the new functions can be used. I spent most of
my time just to ensure that the minimum patch doesn't introduce performance
regression.

> 1. The patch includes two parts: adding _PyObject_FastCall() and adding
PyObject_CallNoArg() and PyObject_CallArg1(). How large the role of latter
functions in the speed up?

See my remark above, no speedup is expected.

Do you suggest to not add these 2 new functions? Since they are well
defined and simple, I chose to make them public. Their API is nicer than
_PyObject_Call().

> Can existing function PyObject_Call() be optimized to achieve a
comparable benefit?

Sorry, I don't understand. This function requires a tuple. The whole
purpose of my patch is to avoid temporary tuples.

In my full patch, PyObject_Call() calls _PyObject_FastCall() in most cases.

> 2. I think that supporting keyword arguments in _PyObject_FastCall()
doesn't make much sense now.

Well, I can add support for keyword arguments later and start with an
assertion (fail if they are used). But I really need them in the API, and I
don't want to change to API later.

I plan to add a new METH_FASTCALL calling convention for C functions. I
would prefer to not have two new calling conventions, but use Argument
Clinic to emit efficient code to parse arguments.

> Calling with keyword arguments adds such much overhead, that it dwarfs
the benefit of avoiding the creation of one tuple. I think that the patch
can be simpler if drop the support of keyword arguments.

Keyword arguments are optional. Having support for them cost nothing when
they are not used.

> 3. The patch adds two files for one function _PyStack_AsTuple(). I would
prefer something like _PyTuple_FromArray(). It could be used in other
places, not just in argument parsing.

I really want to have a "pystack" API. In this patch, the new file looks
useless, but in the full patch there are many functions including a few
complex functions. I prefer to add the file now and complete it later.

I'm limited by Mercurial and our workflow (tools), it would be much easier
to explain my work using a patch serie, but it's not possible to publish a
patch serie...
msg272479 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-08-11 20:05
> Do you suggest to not add these 2 new functions?

Yes, I suggest to not add them. The API for calling is already too large. 
Internally we can directly use _PyObject_FastCall(), and third party code 
should get benefit from optimized PyObject_CallFunctionObjArgs().

> > Can existing function PyObject_Call() be optimized to achieve a
> > comparable benefit?
> Sorry, I don't understand. This function requires a tuple. The whole
> purpose of my patch is to avoid temporary tuples.

Sorry, I meant PyObject_CallFunctionObjArgs() and like.

> Keyword arguments are optional. Having support for them cost nothing when
> they are not used.

My point is that if keyword arguments are used, this is not a fast call, and 
should use old calling protocol. The overhead of creating a tuple for args is 
dwarfen by the overhead of creating a dict for kwargs and parsing it.

> I really want to have a "pystack" API. In this patch, the new file looks
> useless, but in the full patch there are many functions including a few
> complex functions. I prefer to add the file now and complete it later.

But for now there is no a "pystack" API. What do you want to add? Can it be 
added with prefixes PyDict_, PyArg_ or PyEval_? On other side, other code can 
get a benefit from using _PyTuple_FromArray().

Here is alternative simplified patch.

1) _PyStack_AsTuple() is renamed to _PyTuple_FromArray() (-2 new files).
2) Optimized PyObject_CallFunctionObjArgs(), PyObject_CallMethodObjArgs() and 
_PyObject_CallMethodIdObjArgs().
3) Removed PyObject_CallNoArg() and PyObject_CallArg1(). Invocations are 
replaced by PyObject_CallFunctionObjArgs().
4) Removed support of keyword arguments in _PyObject_FastCall() (saved about 
20 lines and few runtime checks in _PyCFunction_FastCall).
5) Reverted changes in Objects/descrobject.c. They added a regression in 
namedtuple attributes access.
msg272503 - (view) Author: Antti Haapala (ztane) * Date: 2016-08-12 07:38
About "I hesitate between the C types "int" and "Py_ssize_t" for nargs. I read once that using "int" can cause performance issues on a loop using "i++" and "data[i]" because the compiler has to handle integer overflow of the int type."

This is true because of -fwrapv, but I believe it is true also for Py_ssize_t which is also of signed type. However, there would be a speed-up achievable by disabling -fwrapv, because only then the i++; data[i] can be safely optimized into *(++data)
msg272884 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-16 21:05
Serhiy Storchaka added the comment:
>> Do you suggest to not add these 2 new functions?
>
> Yes, I suggest to not add them. The API for calling is already too large.
> Internally we can directly use _PyObject_FastCall(), and third party code
> should get benefit from optimized PyObject_CallFunctionObjArgs().

Well, we can start without them, and see later if it's worth it.

I didn't propose to add new functions to make the code faster, but to
make the API simpler.

I dislike PyEval_CallObjectWithKeywords(func, arg, kw) because it has
a special case if arg is a tuple. If arg is a tuple, the tuple is
unpacked. It already leaded to a complex and very bug in the
implementation of generators! See the issue #21209. I'm not sure that
such use case is well known and understood by everyone...

It's common to call a function with no argument or just one argument,
so I proposed to add an obvious and simple API for these common cases.
Well, again, I will open a new issue to discuss that.

>> > Can existing function PyObject_Call() be optimized to achieve a
>> > comparable benefit?
>> Sorry, I don't understand. This function requires a tuple. The whole
>> purpose of my patch is to avoid temporary tuples.
>
> Sorry, I meant PyObject_CallFunctionObjArgs() and like.

Yes, my full patch does optimize these functions:
https://hg.python.org/sandbox/fastcall/file/2dc558e01e66/Objects/abstract.c#l2523

>> Keyword arguments are optional. Having support for them cost nothing when
>> they are not used.
>
> My point is that if keyword arguments are used, this is not a fast call, and
> should use old calling protocol. The overhead of creating a tuple for args is
> dwarfen by the overhead of creating a dict for kwargs and parsing it.

I'm not sure that I understand your point.

For example, in my full patch, I have a METH_FASTCALL calling
convention for C functions. With this calling convention, a function
accepts positional arguments and keyword arguments. If you don't pass
keyword arguments, the call should be faster according to my
benchmarks.

How do you want to implement METH_FASTCALL if you cannot pass keyword
arguments? Does it mean that METH_FASTCALL can only be used by the
functions which don't accept keyword arguments at all?

It's ok if passing keyword arguments is not faster, but simply as fast
as before, if the "positional arguments only" case is faster, no?

>> I really want to have a "pystack" API. In this patch, the new file looks
>> useless, but in the full patch there are many functions including a few
>> complex functions. I prefer to add the file now and complete it later.
>
> But for now there is no a "pystack" API. What do you want to add?

See my fastcall branch:

https://hg.python.org/sandbox/fastcall/file/2dc558e01e66/Include/pystack.h
https://hg.python.org/sandbox/fastcall/file/2dc558e01e66/Python/pystack.c

All these functions are private. They are used internally to implement
all functions of the Python C API to call functions.

> On other side, other code can get a benefit from using _PyTuple_FromArray().

Ah? Maybe you should open a different issue for that.

I prefer to have an API specific to build a "stack" to call functions.

> Here is alternative simplified patch.
>
> 1) _PyStack_AsTuple() is renamed to _PyTuple_FromArray() (-2 new files).
> 2) Optimized PyObject_CallFunctionObjArgs(), PyObject_CallMethodObjArgs() and
> _PyObject_CallMethodIdObjArgs().

My full patch does optimize "everything", it's deliberate to start
with something useless but short.

> 5) Reverted changes in Objects/descrobject.c. They added a regression in
> namedtuple attributes access.

Ah? What is the regression?
msg272887 - (view) Author: Roundup Robot (python-dev) Date: 2016-08-16 21:48
New changeset 288ec55f1912 by Victor Stinner in branch 'default':
Issue #27128: Cleanup _PyEval_EvalCodeWithName()
https://hg.python.org/cpython/rev/288ec55f1912

New changeset e615718a6455 by Victor Stinner in branch 'default':
Use Py_ssize_t in _PyEval_EvalCodeWithName()
https://hg.python.org/cpython/rev/e615718a6455
msg272888 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-16 22:18
Patch version 3: simpler and shorter patch

* _PyObject_FastCall() keeps its kwargs parameter, but it must always be NULL. Support for keyword arguments will be added later.
* I removed PyObject_CallNoArg() and PyObject_CallArg1()
* I moved _PyStack_AsTuple() to Objects/abstract.c. A temporary home until the API grows until to require its own file (Python/pystack.c).

I also pushed some changes unrelated to fastcall in Python/ceval.c to simplify the patch.

Very few functions are modified (directly or indirectly) to use _PyObject_FastCall():

- PyEval_CallObjectWithKeywords()
- PyObject_CallFunction()
- PyObject_CallMethod()
- _PyObject_CallMethodId()

Much more will come in following patches.
msg272891 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-16 22:22
> 5) Reverted changes in Objects/descrobject.c. They added a regression in
> namedtuple attributes access.

Oh, I now understand. The change makes "namedtuple.attr" slower. With fastcall-3.patch attached to this issue, the fast path is not taken on this benchmark, and so you loose the removed optimization (tuple cached in the modified descriptor function).

In fact, you need the "full" fastcall change to make this attribute lookup *faster*:
https://bugs.python.org/issue26814#msg263999

So yeah, it's better to wait until more changes are merged.
msg273136 - (view) Author: Roundup Robot (python-dev) Date: 2016-08-19 15:32
New changeset a1a29d20f52d by Victor Stinner in branch 'default':
Add _PyObject_FastCall()
https://hg.python.org/cpython/rev/a1a29d20f52d

New changeset 89e4ad001f3d by Victor Stinner in branch 'default':
PyEval_CallObjectWithKeywords() uses fast call
https://hg.python.org/cpython/rev/89e4ad001f3d

New changeset 7cd479573de9 by Victor Stinner in branch 'default':
call_function_tail() uses fast call
https://hg.python.org/cpython/rev/7cd479573de9

New changeset 34af2edface9 by Victor Stinner in branch 'default':
Cleanup call_function_tail()
https://hg.python.org/cpython/rev/34af2edface9

New changeset adceb14cab96 by Victor Stinner in branch 'default':
Cleanup callmethod()
https://hg.python.org/cpython/rev/adceb14cab96

New changeset 10f1a4910adb by Victor Stinner in branch 'default':
PEP 7: add {...} around null_error() in abstract.c
https://hg.python.org/cpython/rev/10f1a4910adb

New changeset 5cf9524f2923 by Victor Stinner in branch 'default':
Avoid call_function_tail() for empty format str
https://hg.python.org/cpython/rev/5cf9524f2923

New changeset f1ad6f64a11e by Victor Stinner in branch 'default':
Fix PyObject_Call() parameter names
https://hg.python.org/cpython/rev/f1ad6f64a11e
msg273140 - (view) Author: Roundup Robot (python-dev) Date: 2016-08-19 16:00
New changeset 2da6dc1c30d8 by Victor Stinner in branch 'default':
contains and rich compare slots use fast call
https://hg.python.org/cpython/rev/2da6dc1c30d8

New changeset 2d4d40da2aba by Victor Stinner in branch '3.5':
Fix a refleak in call_method()
https://hg.python.org/cpython/rev/2d4d40da2aba

New changeset 5b1ed48aedef by Victor Stinner in branch '2.7':
Fix a refleak in call_method()
https://hg.python.org/cpython/rev/5b1ed48aedef

New changeset df4efc23ab18 by Victor Stinner in branch '3.5':
Fix a refleak in call_maybe()
https://hg.python.org/cpython/rev/df4efc23ab18

New changeset 7669fb39a9ce by Victor Stinner in branch '2.7':
Fix a refleak in call_maybe()
https://hg.python.org/cpython/rev/7669fb39a9ce
msg273143 - (view) Author: Roundup Robot (python-dev) Date: 2016-08-19 16:54
New changeset 73b00fb1dc9d by Victor Stinner in branch 'default':
Cleanup call_method() and call_maybe()
https://hg.python.org/cpython/rev/73b00fb1dc9d

New changeset 8e085070ab28 by Victor Stinner in branch 'default':
call_method() and call_maybe() now use fast call
https://hg.python.org/cpython/rev/8e085070ab28

New changeset 2d2bc1906b5b by Victor Stinner in branch 'default':
Issue #27128: Cleanup slot_sq_item()
https://hg.python.org/cpython/rev/2d2bc1906b5b

New changeset 6eb586b85fa1 by Victor Stinner in branch 'default':
Issue #27128: slot_sq_item() uses fast call
https://hg.python.org/cpython/rev/6eb586b85fa1

New changeset 605a42a50496 by Victor Stinner in branch 'default':
Issue #27128: Cleanup slot_nb_bool()
https://hg.python.org/cpython/rev/605a42a50496

New changeset 6a21b6599692 by Victor Stinner in branch 'default':
slot_nb_bool() now uses fast call
https://hg.python.org/cpython/rev/6a21b6599692

New changeset 45d2b5c12b19 by Victor Stinner in branch 'default':
slot_tp_iter() now uses fast call
https://hg.python.org/cpython/rev/45d2b5c12b19

New changeset 124d5d0ef81f by Victor Stinner in branch 'default':
calliter_iternext() now uses fast call
https://hg.python.org/cpython/rev/124d5d0ef81f

New changeset 71c22e592a9b by Victor Stinner in branch 'default':
keyobject_richcompare() now uses fast call
https://hg.python.org/cpython/rev/71c22e592a9b
msg273144 - (view) Author: Roundup Robot (python-dev) Date: 2016-08-19 17:01
New changeset 3ab32f7add6e by Victor Stinner in branch 'default':
Issue #27128: _pickle uses fast call
https://hg.python.org/cpython/rev/3ab32f7add6e
msg273153 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-19 19:16
Ok, I updated the most simple forms of function calls.

I will open new issues for more complex calls and more sensible parts of
the code like ceval.c.

Buildbots seem to be happy.
msg273166 - (view) Author: Roundup Robot (python-dev) Date: 2016-08-19 23:50
New changeset c2af917bde71 by Victor Stinner in branch 'default':
PyFile_WriteObject() now uses fast call
https://hg.python.org/cpython/rev/c2af917bde71

New changeset 0da1ce362d15 by Victor Stinner in branch 'default':
import_name() now uses fast call
https://hg.python.org/cpython/rev/0da1ce362d15

New changeset e5b24f595235 by Victor Stinner in branch 'default':
PyErr_PrintEx() now uses fast call
https://hg.python.org/cpython/rev/e5b24f595235

New changeset 154f78d387f9 by Victor Stinner in branch 'default':
call_trampoline() now uses fast call
https://hg.python.org/cpython/rev/154f78d387f9

New changeset 351b987d6d1c by Victor Stinner in branch 'default':
sys_pyfile_write_unicode() now uses fast call
https://hg.python.org/cpython/rev/351b987d6d1c

New changeset abb93035ebb7 by Victor Stinner in branch 'default':
_elementtree: deepcopy() now uses fast call
https://hg.python.org/cpython/rev/abb93035ebb7

New changeset 2954d2aa4c90 by Victor Stinner in branch 'default':
pattern_subx() now uses fast call
https://hg.python.org/cpython/rev/2954d2aa4c90
msg273175 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-20 00:42
I created two new issues:

* issue #27809: _PyObject_FastCall(): add support for keyword arguments
* issue #27810: Add METH_FASTCALL: new calling convention for C functions
msg273349 - (view) Author: Stefan Behnel (scoder) * Date: 2016-08-22 10:14
FYI: I copied your (no-kwargs) implementation over into Cython and I get around 17% faster calls to Python functions with 2 positional arguments.
msg273350 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-22 10:33
> FYI: I copied your (no-kwargs) implementation over into Cython and I get around 17% faster calls to Python functions with 2 positional arguments.

Hey, cool! It's always cool to get performance enhancement without having to break the C API nor having to modify source code :-)

What do you mean by "I copied your (no-kwargs) implementation"? The whole giant patch? Or just a few changes? Which changes?
msg273351 - (view) Author: Roundup Robot (python-dev) Date: 2016-08-22 10:37
New changeset 7dd85b19c873 by Victor Stinner in branch 'default':
Optimize call to Python function without argument
https://hg.python.org/cpython/rev/7dd85b19c873
msg273365 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-08-22 12:29
The problem is that passing keyword arguments as a dict is not the most efficient way due to an overhead of creating a dict. For now keyword arguments are pushed on the stack as interlaced array of keyword names and values. It may be more efficient to push values and names as continuous arrays (issue27213). PyArg_ParseTupleAndKeywords() accepts a tuple and a dict, but private function _PyArg_ParseTupleAndKeywordsFast() (issue27574) can be changed to accept positional and keyword arguments as continuous arrays: (int nargs, PyObject **args, int nkwargs, PyObject **kwnames, PyObject **kwargs). Therefore we will be forced either to change the signature of _PyObject_FastCall() and the meaning of METH_FASTCALL, or add new _PyObject_FastCallKw() and METH_FASTCALLKW for support fast passing keyword arguments. Or may be add yet _PyObject_FastCallNoKw() for faster passing only positional arguments without an overhead of _PyObject_FastCall(). And make older _PyObject_FastCall() and METH_FASTCALL obsolete.

There is yet one possibility. Argument Clinic can generate a dict that maps keyword argument names to indices of arguments and tie it to a function. External code should map names to indices using this dictionary and pass arguments as just a continuous array to function with METH_FASTCALL (raising an error if some argument is passed as positional and keyword, or if keyword-only argument is passed as positional, etc). In that case the kwargs parameter of _PyObject_FastCall() becomes obsolete too.
msg273371 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-22 13:00
Serhiy: I "moved" your msg273365 to the issue #27809.
msg273388 - (view) Author: Stefan Behnel (scoder) * Date: 2016-08-22 17:52
> What do you mean by "I copied your (no-kwargs) implementation"?

I copied what you committed into CPython for _PyFunction_FastCall():

https://github.com/cython/cython/commit/8f3d3bd199a3d7f2a9fdfec0af57145b3ab363ca

and then enabled its usage in a couple of places:

https://github.com/cython/cython/commit/a3cfec8f7bd6d585831dd6669f6dad5f88303c71

especially for all function/method calls that we generate for user code:

https://github.com/cython/cython/commit/a51df339f395634f57b77e3ec13cecb3a28a5462

Note that PyMethod objects get unpacked into function+self right before the PyFunction_Check(), so the tuple avoidance optimisation also applies to Python method calls.
msg273402 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-08-22 20:38
Ok, I see much better with concrete commits. I'm really happy that
Cython also benefits from these enhancements.

Note: handling keywords is likely to change quickly ;-)
msg274123 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-01 13:13
The main features (_PyFunction_FastCall()) has been merged. Supporting keyword arguments is now handled by other issue (see issue #27830). I close this issue.
History
Date User Action Args
2016-09-01 13:13:54vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg274123
2016-08-22 20:38:51vstinnersetmessages: + msg273402
2016-08-22 17:52:38scodersetmessages: + msg273388
2016-08-22 13:00:55vstinnersetmessages: + msg273371
2016-08-22 12:29:11serhiy.storchakasetmessages: + msg273365
2016-08-22 10:37:28python-devsetmessages: + msg273351
2016-08-22 10:33:05vstinnersetmessages: + msg273350
2016-08-22 10:14:50scodersetmessages: + msg273349
2016-08-21 11:41:35scodersetnosy: + scoder
2016-08-20 00:42:08vstinnersetmessages: + msg273175
2016-08-19 23:50:42python-devsetmessages: + msg273166
2016-08-19 19:16:19vstinnersetmessages: + msg273153
2016-08-19 17:01:25python-devsetmessages: + msg273144
2016-08-19 16:54:06python-devsetmessages: + msg273143
2016-08-19 16:00:05python-devsetmessages: + msg273140
2016-08-19 15:32:20python-devsetmessages: + msg273136
2016-08-16 22:22:40vstinnersetmessages: + msg272891
2016-08-16 22:18:41vstinnersetfiles: + fastcall-3.patch

messages: + msg272888
2016-08-16 21:48:59python-devsetnosy: + python-dev
messages: + msg272887
2016-08-16 21:05:56vstinnersetmessages: + msg272884
2016-08-12 07:38:32ztanesetnosy: + ztane
messages: + msg272503
2016-08-11 20:05:03serhiy.storchakasetfiles: + fast_call_alt.patch

messages: + msg272479
2016-08-09 21:30:25vstinnersetmessages: + msg272268
2016-08-09 19:45:50serhiy.storchakasetmessages: + msg272264
2016-08-08 22:01:15vstinnersetmessages: + msg272197
2016-08-08 00:57:10vstinnersetmessages: + msg272139
2016-08-08 00:56:18vstinnersetfiles: + fastcall-2.patch
2016-08-08 00:53:31vstinnersetfiles: - fastcall-2.patch
2016-08-08 00:50:22vstinnersetfiles: + fastcall-2.patch

messages: + msg272138
2016-06-09 21:26:47vstinnersetmessages: + msg268059
2016-06-09 20:23:35serhiy.storchakasetmessages: + msg268057
2016-05-26 13:10:52vstinnersetmessages: + msg266430
2016-05-26 13:09:42vstinnersetfiles: + default-May26-13-36-33.log

messages: + msg266429
2016-05-26 10:36:10vstinnersetmessages: + msg266424
2016-05-26 10:15:56vstinnercreate