classification
Title: convert print() to METH_FASTCALL
Type: performance Stage:
Components: Versions: Python 3.7
process
Status: closed Resolution: later
Dependencies: Superseder:
Assigned To: Nosy List: haypo, inada.naoki, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-01-17 11:34 by inada.naoki, last changed 2017-01-20 07:51 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
print-fastcall.patch inada.naoki, 2017-01-17 11:34 review
Messages (13)
msg285632 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-01-17 11:34
Median +- std dev: [default] 24.2 ms +- 0.4 ms -> [patched] 19.2 ms +- 0.4 ms: 1.26x faster (-21%)
msg285634 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-17 12:03
The performance of print() is not critical. It usually involves slow formatting and IO.

_PyArg_ParseStackAndKeywords() is a part of unstable, fast-evolved API. I would beware of using it in common code.
msg285637 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-01-17 12:35
I see.  AC should support this.
msg285638 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-17 13:23
Hi,

Serhiy Storchaka: "The performance of print() is not critical. It usually involves slow formatting and IO."

Oh, please, I want a print() using METH_FASTCALL! Not for the performance, but just to help me in my development!

To identify code not using FASTCALL yet, I'm putting a breakpoint on _PyStack_AsTuple() and _PyStack_AsDict(). It's common to stop on print() when testing the performance benchmark suite.

FYI currnently, I'm adding the following Python code to python-gdb.py:
---
class MyBreakpoint (gdb.Breakpoint):
    def stop(self):
        # frame of the caller
        frame = gdb.selected_frame()
        frame = frame.older()

        name = frame.name()
        if name == '_PyCFunction_FastCallKeywords':
            pass
        else:
            # callable->ob_type
            obj = frame.read_var('callable')
            obj = obj.referenced_value()['ob_type']

            obj2 = gdb.lookup_global_symbol('PyType_Type').value()

            if obj == obj2.address:
                return False

            print("don't skip", obj, obj2.address)

        gdb.execute("up")
        return True


MyBreakpoint("_PyStack_AsTuple")
MyBreakpoint("_PyStack_AsDict")
---

These macros make gdb 40x slower :-/

The code is used to skip type_call(): this function is known to not use FASTCALL yet. I plan to add tp_fastnew and tp_fastinit to optimize type_call(), but it's already hard enough to implement tp_fastcall, so I will do it later.

> _PyArg_ParseStackAndKeywords() is a part of unstable, fast-evolved API. I would beware of using it in common code.

Ah. I have a different opinion on that. Since _PyArg_ParseStackAndKeywords() and print() are part of the Python core, we have free to use private fast-moving APIs.

By the way, _PyArg_ParseStackAndKeywords() already uses the new efficient _PyArg_Parser object. Do you plan other enhancements?


INADA Naoki: "I see.  AC should support this."

Right, but IMHO it can be done later.

I tried to contribute to AC but the code is super complex. I understood that AC is not complex, but Python is complex :-D AC is full of corner cases. It's a very long list of special cases :-)
msg285668 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-17 17:03
I reviewed print-fastcall.patch: LGTM, but I proposed a minor change.


Serhiy Storchaka: "The performance of print() is not critical. It usually involves slow formatting and IO."

I also had the same understanding of print(), but I just analyzed performances of the bm_telco benchmark, and it seems just like handling function parameters of print() take 20% of the runtime!?

   http://bugs.python.org/issue29259#msg285667

bm_telco reference (unpatched) => with issue #29259 tp_fastcall-2.patch and print-fastcall.patch:

   20.9 ms +- 0.5 ms => 16.7 ms +- 0.2 ms

print-fastcall.patch makes bm_telco 20% faster! Just to make sure, I ran again bm_telco only with tp_fastcall-2.patch:

   telco: Median +- std dev: 21.4 ms +- 0.8 ms

Maybe we should optimize _PyStack_AsDict(), but that's a different topic ;-)
msg285672 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-17 17:15
Please wait until the support of var-positional arguments be added in Argument Clinic. After that print() could be just converted to Argument Clinic.
msg285692 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-18 00:59
Serhiy Storchaka: "Please wait until the support of var-positional arguments be added in Argument Clinic. After that print() could be just converted to Argument Clinic."

Why can't we start with METH_FASTCALL and convert to AC later? It seems like print-fastcall.patch will be a major performance enhancement (at least for the very specific bm_telco benchmark), once tp_fastcall field will be added.
msg285780 - (view) Author: Roundup Robot (python-dev) Date: 2017-01-19 11:51
New changeset 0327171f05dd by INADA Naoki in branch 'default':
Issue #29296: convert print() to METH_FASTCALL
https://hg.python.org/cpython/rev/0327171f05dd
msg285781 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-19 11:55
I pushed print-fastcall.patch since it's simple and has a significant impact on bm_telco benchmark (especially when tp_fastcall slot will be added).

I added a reminder (msg285779) in #20291 to convert print() once AC will support **kwargs.

Can we close the issue, or do you prefer to keep it open?
msg285784 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-19 12:13
> Why can't we start with METH_FASTCALL and convert to AC later?

It increases the maintenance burden. When the fastcall protocol be changed, this would require changing more handwritten code.
msg285847 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-19 21:46
Serhiy Storchaka:
> When the fastcall protocol be changed, this would require changing more handwritten code.

I don't plan to modify the fastcall protocol. I'm not sure that it's
really possible to modify it anymore. It would probably be simpler to
add a new protocol. But it may become a little bit annoying to have to
support so many calling convention :-) So someone should come up with
a serious rationale to add another one :-)

I tried to make fastcall as private as possible, but recently I
noticed that METH_FASTCALL is now public in Python 3.6 API (but
hopefully excluded from the stable ABI). I know that it's already used
by Cython since 0.25:
https://mail.python.org/pipermail/cython-devel/2016-October/004959.html

About the maintenance burden, I'm ok to handle it :-) My final goal is
to use Argument Clinic everywhere. While converting existing code to
AC is a slow process, I consider (from my recent experiences with AC)
that the new code is easier to maintain! It's a slow process beause AC
still has limitaitons, but also because it's used an opportunity to
review (and enhance!) the existing docstrings.
msg285886 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-01-20 06:36
> Can we close the issue, or do you prefer to keep it open?

I don't care.
msg285890 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-01-20 07:51
Since the title of the issue is "convert print() to METH_FASTCALL", I consider that the issue is now done. As I wrote, as soon as AC will support **kwargs, we will be able to print() to use AC.

Thanks Naoki for the change.
History
Date User Action Args
2017-01-20 07:51:29hayposetstatus: open -> closed

dependencies: - Argument Clinic should understand *args and **kwargs parameters
messages: + msg285890
2017-01-20 06:36:39inada.naokisetmessages: + msg285886
2017-01-19 21:46:34hayposetmessages: + msg285847
2017-01-19 12:13:18serhiy.storchakasetmessages: + msg285784
2017-01-19 11:55:37hayposetmessages: + msg285781
2017-01-19 11:51:10python-devsetnosy: + python-dev
messages: + msg285780
2017-01-18 00:59:35hayposetmessages: + msg285692
2017-01-17 17:15:21serhiy.storchakasetmessages: + msg285672
2017-01-17 17:03:15hayposetmessages: + msg285668
2017-01-17 13:23:37hayposetstatus: pending -> open

messages: + msg285638
2017-01-17 12:35:20inada.naokisetstatus: open -> pending
resolution: later
dependencies: + Argument Clinic should understand *args and **kwargs parameters
messages: + msg285637
2017-01-17 12:03:05serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg285634
2017-01-17 11:34:31inada.naokicreate