Title: Add RETURN_NONE bytecode instruction
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.7
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, matrixise, rhettinger, serhiy.storchaka, tim.peters, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2016-11-25 10:55 by vstinner, last changed 2016-11-28 17:07 by vstinner. This issue is now closed.

File name Uploaded Description Edit
return_none.patch vstinner, 2016-11-25 10:55 review
Messages (11)
msg281696 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-25 10:55
Attached patch adds a new bytecode instruction: RETURN_NONE. It is similar to "LOAD_CONST; RETURN_VALUE" but it avoids the need of adding None to constants of the code object. For function with an empty body, it reduces the stack size from 1 to 0.

Example on reference Python 3.7:

>>> def func(): return
>>> import dis; dis.dis(func)
  1           0 LOAD_CONST               0 (None)
              2 RETURN_VALUE
>>> func.__code__.co_stacksize

Example on patched Python 3.7:

>>> def func(): return
>>> import dis; dis.dis(func)
  1           0 RETURN_CONST
>>> func.__code__.co_stacksize

If the function has a docstring, RETURN_CONST avoids adding None to code constants:

>>> def func():
...  "docstring"
...  return
>>> func.__code__.co_consts

I will now run benchmarks on the patch.
msg281699 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-25 11:05
I'm proposing this patch because I noticed that reducing the number of opcodes makes Python faster in my old registervm project: a fork of CPython which uses a register-based bytecode:

I also plan to propose a CALL_PROCEDURE method to replace CALL_FUNCTION+POP_TOP. Only for the the simple CALL_FUNCTION, not for complex CALL_FUNCTION_KW nor CALL_FUNCTION_EX.
msg281702 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-25 11:27
Do you want to add RETURN_NONE or RETURN_CONST? Or both?

Adding new special opcodes can decrease the size of the code and increase performance of some cases. But it adds maintenance burden, increases the complexity of the compiler and peephole optimizer, and increases the size of ceval loop. The latter can have negative effect on the performance. I think we should add new specialized opcodes only if they adds measurable gain to global performance or large speed up of important particular cases.

It would help if you gather the statistics of RETURN_* opcodes. How many RETURN_VALUE, RETURN_CONST and RETURN_NONE instructions are compiled and executed during running Python tests? Compare it with total number of compiled and executed instructions.
msg281719 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-25 13:45
Results of performance 0.5.0 on speed-python:

haypo@speed-python$ python3 -m perf compare_to -G --min-speed=3 2016-11-23_19-34-default-3d660ed2a60e.json patch.json 
Slower (3):
- spectral_norm: 265 ms +- 2 ms -> 277 ms +- 7 ms: 1.04x slower
- xml_etree_iterparse: 217 ms +- 2 ms -> 226 ms +- 4 ms: 1.04x slower
- nqueens: 261 ms +- 2 ms -> 269 ms +- 3 ms: 1.03x slower

Faster (3):
- scimark_sor: 519 ms +- 10 ms -> 496 ms +- 7 ms: 1.05x faster
- mako: 43.0 ms +- 0.8 ms -> 41.5 ms +- 0.2 ms: 1.04x faster
- call_method: 16.2 ms +- 0.2 ms -> 15.7 ms +- 0.3 ms: 1.03x faster

Benchmark hidden because not significant (58): 2to3, call_method_slots, call_method_unknown, (...)

Hum, boring result. This change alone doesn't change any significant speedup, even some slowndon. Maybe it's just a bad idea. Maybe it should be combined with other new bytecode instructions. Maybe only a full new instruction set using registers show significant speedup. I don't know :-(
msg281720 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-25 14:08
> Do you want to add RETURN_NONE or RETURN_CONST? Or both?

Only RETURN_NONE because on some corner cases it allow to avoid completely the None constant from code.co_consts and reduce the stack size.

I'm not sure that I want to start taking the same road of WPython which added a *lot* of specialized instructions (combining two or more existing instructions). As you said, it has a cost on the maintenance, and might have a negative impact if _PyEval_EvalFrameDefault() becomes too big.
msg281821 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-27 11:15
The pair LOAD_CONST/RETURN_VALUE is on 19th place of the top of opcode pairs (see msg269391 in issue27255). Not all of these constants are None. And since the time of LOAD_CONST is much smaller then the time of RETURN_VALUE (the latter includes destroying a frame and should be in a pair with CALL_FUNCTION), I think the performance effect of RETURN_NONE is much less than 1%.
msg281831 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-11-27 18:17
When we added LIST_APPEND (the first of the custom opcodes intended for optimization), it was done only because it was frequently used in inner loops where it provided real benefits to users.

In contrast this opcode seems like a waste.  Historically for opcodes, we've valued orthogonality and parsimony.  The opcode set was intentionally kept simple and minimal.  The recent opcode additions have disregarded these values.
msg281835 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-11-27 21:30
My vote is to close this issue since the performance isn't panning out.
msg281837 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-11-27 21:43
I also don't see a good reason to keep this open now - adds complication for no quantifiable payoff.
msg281891 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-11-28 17:02
-1; we have too many opcodes already. Let's not complicate the code if there's no performance improvement.
msg281893 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-28 17:07
Sorry, I didn't want to bother you. I should run the benchmark *before* opening an issue next time :-) I agree that the speedup is negligible, so it's not worth it. The main purpose of the patch was an optimization. I close the issue. Thanks ;-)
Date User Action Args
2016-11-28 17:07:17vstinnersetstatus: open -> closed
resolution: rejected
messages: + msg281893
2016-11-28 17:02:21yselivanovsetnosy: + yselivanov
messages: + msg281891
2016-11-27 21:43:07tim.peterssetmessages: + msg281837
2016-11-27 21:30:45brett.cannonsetnosy: + brett.cannon
messages: + msg281835
2016-11-27 18:17:19rhettingersetnosy: + tim.peters
messages: + msg281831
2016-11-27 11:15:21serhiy.storchakasetmessages: + msg281821
2016-11-25 14:08:21vstinnersetmessages: + msg281720
2016-11-25 13:45:10vstinnersetmessages: + msg281719
2016-11-25 11:27:24serhiy.storchakasetmessages: + msg281702
2016-11-25 11:05:41vstinnersetmessages: + msg281699
2016-11-25 10:59:43serhiy.storchakasetnosy: + rhettinger

type: enhancement
components: + Interpreter Core
stage: patch review
2016-11-25 10:55:24vstinnersettitle: Add RETURN_NONE -> Add RETURN_NONE bytecode instruction
2016-11-25 10:55:15vstinnercreate