classification
Title: Break up COMPARE_OP into logically distinct operations.
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, pablogsal, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2019-12-29 19:44 by Mark.Shannon, last changed 2020-01-14 10:13 by Mark.Shannon. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 17754 merged Mark.Shannon, 2019-12-30 10:54
Messages (7)
msg359002 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2019-12-29 19:44
Currently the COMPARE_OP instruction performs one of four different tasks.
We should break it up into four different instructions, that each performs only one of those tasks.

The four tasks are:
  Rich comparison (>, <, ==, !=, >=, <=)
  Identity comparison (is, is not)
  Contains test (in, not in)
  Exception matching

The current implementation involves an unnecessary extra dispatch to determine which task to perform.
Comparisons are common operations, so this extra call and unpredictable branch has a cost.

In addition, testing for exception matching is always followed by a branch, so the test and branch can be combined.

I propose adding three new instructions and changing the meaning of `COMPARE_OP`.

COMPARE_OP should only perform rich comparisons, and should call `PyObject_RichCompare` directly.
IS_OP performs identity tests, performs no calls and cannot fail.
CONTAINS_OP tests for 'in and 'not in' and should call `PySequence_Contains` directly.
JUMP_IF_NOT_EXC_MATCH Tests whether the exception matches and jumps if it does not.
msg359003 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-29 20:39
I think is a good idea, being the only problem that I see that as the opcode targets are limited we would be burning 3 more. I specially like the proposal for JUMP_IF_NOT_EXC_MATCH as PyCmp_EXC_MATCH is only used in the code for the try-except and is always followed by a POP_JUMP_IF_FALSE.

As a curiosity, it would be good to have an idea on performance gains of specializing the comparison.
msg359005 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-12-29 21:32
When COMPARE_OP occurs in a loop, the dispatch tends to be branch predictable. So there may not be real-world performance benefit to splitting the opcodes.
msg359009 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-12-29 21:54
Few years ago I experimented with a special opcode for exception matching. It could make the bytecode a tiny bit smaller and faster, but since catching an exception in Python is relatively expensive, it would not have significant performance benefit.

As for splitting COMPARE_OP for comparison, identity and containment tests, all these operations look the same from the compiler side, they correspond the same AST node. Introducing new opcodes will complicate the compiler.
msg359013 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-12-29 22:37
> Introducing new opcodes will complicate the compiler.

And it will complicate opcode.py and peephole.c and anything else that touches the word codes.
msg359032 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2019-12-30 10:42
Moving work from the interpreter to the compiler is always a good idea.

Performance: The compiler is run once per code unit, the interpreter thousands or millions of times.

The compiler is easier to test. Just match the expected bytecode with the actual bytecode. 
The output can be sanity checked by visual inspection.


Although I expect a performance boost, I think this is a worthwhile improvement whether or not it helps 
performance as it makes the instructions better focused.


Pablo, currently there are 117 opcodes, increasing that to 120 is not a problem.
Also there is no reason why we are limited to 256 opcodes in the long term.
Plus, I'm 4 opcodes in credit, thanks to https://bugs.python.org/issue33387 :)

Raymond, regarding the performance of COMPARE_OP, it is not just branch prediction that matters. With this change, the number of (hardware) instructions executed is always reduced, even if branch prediction is no better.

Serhiy, the benefit of having a special opcode for exception matching is not really to speed up exception matching, but to avoid slowing down other tests.
msg359963 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2020-01-14 10:13
New changeset 9af0e47b1705457bb6b327c197f2ec5737a1d8f6 by Mark Shannon in branch 'master':
bpo-39156: Break up COMPARE_OP into four logically distinct opcodes. (GH-17754)
https://github.com/python/cpython/commit/9af0e47b1705457bb6b327c197f2ec5737a1d8f6
History
Date User Action Args
2020-01-14 10:13:33Mark.Shannonsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-01-14 10:13:07Mark.Shannonsetmessages: + msg359963
2019-12-30 10:54:40Mark.Shannonsetkeywords: + patch
stage: patch review
pull_requests: + pull_request17190
2019-12-30 10:42:20Mark.Shannonsetmessages: + msg359032
2019-12-29 22:37:20rhettingersetmessages: + msg359013
2019-12-29 21:54:48serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg359009
2019-12-29 21:32:14rhettingersetnosy: + rhettinger
messages: + msg359005
2019-12-29 20:39:06pablogsalsetmessages: + msg359003
2019-12-29 20:34:09pablogsalsetnosy: + pablogsal
2019-12-29 19:44:42Mark.Shannoncreate