This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: implement per-opcode cache in ceval
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: 26058 26110 Superseder:
Assigned To: yselivanov Nosy List: Jim Fasarakis-Hilliard, Mark.Shannon, Roy Williams, Ryan May, brett.cannon, eric.snow, francismb, gvanrossum, josh.r, jstasiak, methane, ncoghlan, ned.deily, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2016-01-27 17:11 by yselivanov, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
opcache1.patch yselivanov, 2016-01-27 18:27 review
opcache2.patch yselivanov, 2016-02-01 19:03 review
opcache3.patch yselivanov, 2016-04-29 23:06 review
Pull Requests
URL Status Linked Edit
PR 12884 merged methane, 2019-04-20 10:48
PR 13775 merged methane, 2019-06-03 12:55
PR 13809 merged vstinner, 2019-06-04 14:41
PR 13997 merged vstinner, 2019-06-12 02:04
PR 14000 merged miss-islington, 2019-06-12 02:41
Repositories containing patches
https://github.com/1st1/cpython/tree/opcache5
Messages (42)
msg259040 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-01-27 18:15
I assume there's going to be a patch or more of a description of what your idea is? :)
msg259042 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-01-27 18:27
Yeah, I needed a URL of the issue for my email to python-dev ;)

Here's a link to the email, that explains a lot about this patch: https://mail.python.org/pipermail/python-dev/2016-January/142945.html

The patch is also attached (opcache1.patch).
msg259051 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-01-27 19:58
BTW, there are a couple of unit-tests that fail.  Both can be easily fixed.

To really move this thing forward, we need to profile the memory usage.  First, it would be interesting to see how much additional memory is consumed if we optimize every code object.  That will give us an idea of the overhead, and if it is significant, we can experiment with various heuristics to minimize it.
msg259052 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-01-27 20:01
If you run hg.python.org/benchmarks on Linux it has a flag to measure memory.
msg259060 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-01-27 20:45
> If you run hg.python.org/benchmarks on Linux it has a flag to measure memory.

Great.  I'll re-run the benchmarks.  BTW, the latest results are here: https://gist.github.com/1st1/aed69d63a2ff4de4c7be
msg259064 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-27 21:36
If I understand correctly, this change requires to wait until the PEP 509 is accepted, right? Well, I'm not really suprised, since global cache is mentioned as an use case of the PEP 509, and other global cache patches are mentioned in Prior Art.
msg259065 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-01-27 21:43
Yes, this patch depends on PEP 509.
msg259344 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-02-01 19:03
Attaching a new version of the patch.  Issues #26058 and #26110 need to be merged before we can start reviewing it.
msg259371 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-02-02 08:47
I'm concerned by the test_descr failure.

======================================================================
ERROR: test_vicious_descriptor_nonsense (test.test_descr.ClassPropertiesAndMethods)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/haypo/prog/python/default/Lib/test/test_descr.py", line 4176, in test_vicious_descriptor_nonsense
    self.assertEqual(c.attr, 1)
AttributeError: 'C' object has no attribute 'attr'

----------------------------------------------------------------------


======================================================================
FAIL: test_objecttypes (test.test_sys.SizeofTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/haypo/prog/python/default/Lib/test/test_sys.py", line 895, in test_objecttypes
    check(get_cell().__code__, size('5i9Pi3P'))
  File "/home/haypo/prog/python/default/Lib/test/support/__init__.py", line 1475, in check_sizeof
    test.assertEqual(result, size, msg)
AssertionError: 192 != 160 : wrong size for <class 'code'>: got 192, expected 160

----------------------------------------------------------------------


======================================================================
FAIL: test_pycfunction (test.test_gdb.PyBtTests)
Verify that "py-bt" displays invocations of PyCFunction instances
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/haypo/prog/python/default/Lib/test/test_gdb.py", line 857, in test_pycfunction
    self.assertIn('#0 <built-in method gmtime', gdb_output)
AssertionError: '#0 <built-in method gmtime' not found in 'Breakpoint 1 at 0x6518e3: file ./Modules/timemodule.c, line 338.\n[Thread debugging using libthread_db enabled]\nUsing host libthread_db library "/lib64/libthread_db.so.1".\n\nBreakpoint 1, time_gmtime (self=<module at remote 0x7ffff7e6ff58>, args=(1,)) at ./Modules/timemodule.c:338\n338\t    if (!parse_time_t_args(args, "|O:gmtime", &when))\n#1 <built-in method gmtime of module object at remote 0x7ffff7e6ff58>\n#4 Frame 0x7ffff7fb27b8, for file <string>, line 3, in foo ()\n#7 Frame 0x7ffff7f423f8, for file <string>, line 5, in bar ()\n#10 Frame 0x7ffff7fb25e0, for file <string>, line 6, in <module> ()\n'

----------------------------------------------------------------------
msg259410 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-02-02 18:20
Victor,

Thanks for the initial review.  I'll work on the patch sometime later next week.  

As for test_vicious_descriptor_nonsense -- yeah, I saw that too.  I know what's going on there and I know how to fix that.  FWIW that test tests a very counter-intuitive and quirky CPython implementation detail.
msg259522 - (view) Author: Francis MB (francismb) * Date: 2016-02-03 22:15
From the two checks on Python/compile.c:

+ expr_ty meth = e->v.Call.func;
[...]
+    /* Check [...] that
+       the call doesn't have keyword parameters. */
[...]
+    /* Check that there are no *varargs types of arguments. */
[...]

I just wonder how many times those kind of checks/guards are done
on the whole Cpython code base (the second one seems expensive).

(Naive Idea):
Wouldn't be some kind of fast function description (e.g. bit flags
or 'e->v.Call.func.desc') generally helpful? The function description
could have: 'has_keywords' or 'has_varargs', ... 

Thanks in advance!
msg259523 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-02-04 00:46
> From the two checks on Python/compile.c:

Stuff done in compile.c is cached in *.pyc files.

The for-loop you saw shouldn't take a lot of time - it iterates through function parameters, which an average function doesn't have more than 3-6.
msg262623 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-03-29 20:21
See the issue #26647 which may make the implementation of this cache simpler.

See also my message about inline caching:
https://bugs.python.org/issue26647#msg262622
msg264528 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-04-29 22:14
Victor brought this patch to my attention as the motivation for PEP 509. Unfortunately the patch doesn't apply cleanly and I don't have time to try and piece together all the different parts. From the description on python-dev you linked to there are actually a few different patches that all work together and result in a significant speedup. Is there any chance that you could get all this up to date so it applies cleanly to the CPython 3.6 repo (the hg version!), and report some realistic benchmarks? The speedups reported sound great, but I worry that there's a catch (e.g. memory cost that doesn't become apparent until you have a large program, or minimal speedup on realistic code). This is currently holding up approval of PEP 509.
msg264532 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-04-29 22:43
Hi Guido, I'll try to update the patch soon.

> but I worry that there's a catch (e.g. memory cost that doesn't become apparent until you have a large program, or minimal speedup on realistic code).

Here's an excerpt from my email [1] to Python-dev on memory footprint:

To measure the max/average memory impact, I tuned my code to optimize 
*every* code object on *first* run.  Then I ran the entire Python test 
suite.  Python test suite + standard library both contain around 72395 
code objects, which required 20Mb of memory for caches.  The test 
process consumed around 400Mb of memory.  Thus, the absolute worst case 
scenario, the overhead is about 5%.

[1] https://mail.python.org/pipermail/python-dev/2016-February/143025.html
msg264533 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-04-29 22:51
Thanks, that's a cool stat. Please do update the patch.
msg264534 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-04-29 23:06
Alright, attaching a rebased patch (opcache3.patch).

Some caveats:

1. The patch embeds a fairly outdated PEP 509 implementation.

2. A PoC implementation of LOAD_METHOD opcode that should be really cleaned-up (and some parts of it rewritten).

3. I was going to spend more time on ceval.c code to make it lighter, and, perhaps, break large chunks of code into small functions.

All in all, this patch can be used for benchmarking, but it's not yet ready for a thorough code review.

Last thing, if you flip OPCODE_CACHE_STATS to 1 in ceval.c, python will print debug stats on opcode cache.
msg264658 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-05-02 18:43
I'm confused by the relationship between this and issue 26110. That seems to be a much simpler patch (which also doesn't apply cleanly). If 26110 really increases method calls by 20%, what does this add? (By which I mean (a) what additional optimizations does it have, and (b) what additional speedup does it have?)

I'm asking because I'm still trying to look for reasons why I should accept PEP 509, and this is brought up as a reason.
msg264660 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-05-02 18:45
I'm also looking for some example code that would show clearly the kind of speedup we're talking about.
msg264664 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-05-02 19:46
> I'm confused by the relationship between this and issue 26110.


This patch embeds the implementation of 26110.  I'm no longer sure it was a good idea to have two issues instead of one, everybody seems to be confused about that ;)


> That seems to be a much simpler patch (which also doesn't apply cleanly). If 26110 really increases method calls by 20%, what does this add? (By which I mean (a) what additional optimizations does it have, and (b) what additional speedup does it have?)


I'm sorry for the long response, please bare with me.  This issue is complex, and it's very hard to explain it all in a short message.


The patch from 26110 implements LOAD_METHOD/CALL_METHOD pair of opcodes.  The idea is that we can avoid instantiation of BoundMethod objects for code that looks like "something.method(...)'.  I wanted to first get in shape the patch from 26110, commit it, and then, use the patch from this issue to add additional speedups.


This patch implements a generic per-opcode cache.  Using that cache, it speeds up LOAD_GLOBAL, LOAD_ATTR, and LOAD_METHOD (from 26110) opcodes.  The cache works on per-code object basis, only optimizing code objects that were run more than 1,000 times.


* LOAD_GLOBAL uses cache to store pointers for requested names.  Since the cache is per-opcode, the name is always the same for the given LOAD_GLOBAL.  The cache logic uses PEP 509 to invalidate the cache (although the cache is almost never invalidated for real code).

This optimization makes micro optimizations like "def smth(len=len)" obsolete.  LOAD_GLOBAL becomes much faster, almost as fast as LOAD_FAST.


* LOAD_ATTR uses Armin Rigo's clever types cache, and a modified PyDict_GetItem (PyDict_GetItemHint), which accepts a suggested position of the value in the hash table.  Basically, LOAD_ATTR stores in its cache a pointer to the type of the object it works with, its tp_version_tag, and a hint for PyDict_GetItemHint.  When we have a cache hit, LOAD_ATTR becomes super fast, since it only needs to lookup key/value in type's dict by a known offset (the real code is a bit more complex, to handle all edge cases of descriptor protocol etc).

Python programs have a lot of LOAD_ATTRs.  It also seems that dicts of classes are usually stable, and that objects rarely shadow class attributes.


* LOAD_METHOD is optimized very similarly to LOAD_ATTR.  The idea is that if the opcode is optimized, then we simply store a pointer to the function we want to call with CALL_METHOD.  Since 'obj.method' usually implies that 'method' is implemented on 'obj.__class__' this optimization makes LOAD_METHOD even faster.  That's how `s.startswith('abc')` becomes as fast as `s[:3] == 'abc'`.


> If 26110 really increases method calls by 20%, what does this add? 

Speeds up method calls another 15%, speeds up global name lookups and attribute lookups.


> (b) what additional speedup does it have

Here're some benchmark results: https://gist.github.com/1st1/b1978e17ee8b82cc6432

- call_method micro benchmark is 35% faster
- 2to3 is 7-8% faster
- richards - 18% faster
- many other benchmarks are 10-15% faster.  Those that appear slower aren't stable, i.e. one run they are slower, another they are faster.

I'd say each of the above optimizations speeds up macro-benchmarks by 2-4%.  Combined, they speed up CPython 7-15%.


> I'm asking because I'm still trying to look for reasons why I should accept PEP 509, and this is brought up as a reason.

Re PEP 509 and these patches:

1. I want to first find time to finish up and commit 26110.
2. Then I'll do some careful benchmarking for this patch and write another update with results.
3. This patch adds to the complexity of ceval, but if it really speeds up CPython it's well worth it.  PEP 509 will need to be approved if we decide to move forward with this patch.

I'd say that if you aren't sure about PEP 509 right now, then we can wait a couple of months and decide later.
msg264671 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-05-02 21:13
OK, I get it. I think it would be really helpful if issue 26110 was updated, reviewed and committed -- it sound like a good idea on its own, and it needs some burn-in time due to the introduction of two new opcodes. (That's especially important since there's also a patch introducing wordcode, i.e. issue 26647 and undoubtedly the two patches will conflict.) It also needs to show benchmark results on its own (but I think you've got that).

I am also happy to see the LOAD_GLOBAL optimization, and it alone may be sufficient to save PEP 509 (though I would recommend editing the text of the PEP dramatically to focus on a brief but crisp description of the change itself and the use that LOAD_GLOBAL would make of it and the microbench results; it currently is a pain to read the whole thing).

I have to read up on what you're doing to LOAD_ATTR/LOAD_METHOD. In the mean time I wonder how that would fare in a world where most attr lookups are in namedtuples.

As a general recommendation I would actually prefer more separate patches (even though it's harder on you), just with very clearly stated relationships between them.

A question about the strategy of only optimizing code objects that are called a lot. Doesn't this mean that a "main" function containing a major inner loop doesn't get the optimization it might deserve?

PS. I like you a lot but there's no way I'm going to "bare" with you. :-)
msg264674 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-05-02 21:38
> OK, I get it. I think it would be really helpful if issue 26110 was updated, reviewed and committed -- it sound like a good idea on its own, and it needs some burn-in time due to the introduction of two new opcodes. (That's especially important since there's also a patch introducing wordcode, i.e. issue 26647 and undoubtedly the two patches will conflict.) It also needs to show benchmark results on its own (but I think you've got that).

Right.  Victor asked me to review the wordcode patch (and maybe even commit), and I'll try to do that this week.  LOAD_METHOD/CALL_METHOD patch needs some refactoring, right now it's a PoC-quality.  I agree it has to go first.

> I am also happy to see the LOAD_GLOBAL optimization, and it alone may be sufficient to save PEP 509 (though I would recommend editing the text of the PEP dramatically to focus on a brief but crisp description of the change itself and the use that LOAD_GLOBAL would make of it and the microbench results; it currently is a pain to read the whole thing).

Alright.  I'll work on this with Victor.

> I have to read up on what you're doing to LOAD_ATTR/LOAD_METHOD. In the mean time I wonder how that would fare in a world where most attr lookups are in namedtuples.

I think there will be no difference, but I can add a microbenchmark and see.

> As a general recommendation I would actually prefer more separate patches (even though it's harder on you), just with very clearly stated relationships between them.

NP. This is a big change to review, and the last thing I want is to accidentally make CPython slower.

> A question about the strategy of only optimizing code objects that are called a lot. Doesn't this mean that a "main" function containing a major inner loop doesn't get the optimization it might deserve?

Right, the "main" function won't be optimized.  There are two reasons of why I added this threshold of 1000 calls before optimizations:

1. We want to limit the memory overhead.  We, generally, don't want to optimize module code objects, and code objects that are called just a few times over long periods of time.  We can introduce additional heuristic to detect long running loops, but I'm afraid that it will add overhead to the loops that don't need this optimization.

2. We want the environment to become stable -- the builtins and globals namespaces to be fully populated (and, perhaps, mutated), classes fully initialized etc.

> PS. I like you a lot but there's no way I'm going to "bare" with you. :-)

Haha, this is my typo of the month, I guess ;)
msg264676 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-05-02 21:45
All sounds good. Glad the issue of long-running loops is at least on your
radar.
msg310845 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-01-27 06:36
With the 3.7 beta deadline just around the corner, 3.8 will be the next opportunity to reconsider this idea.
msg310956 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-01-28 15:39
> With the 3.7 beta deadline just around the corner, 3.8 will be the next opportunity to reconsider this idea.

Nick, Ned, Guido,

Would it be OK if we add this to 3.7beta2?

I feel kind of bad about this one... few thoughts:  

1. The speedup is about 5-10% on macro-benchmarks!  This is significant.

2. Inada-san offered me help with this, and I declined, saying that I plan to do more research/experiments and commit it before 3.7.  And I didn't have time to do it because of other work I did for CPython.

3. Even though the optimization is invasive, I feel confident about it.  It just so happens that whatever modification we do to ceval, it will be covered 100% with tests or any code you execute.  Any regression or anomaly should become apparent almost immediately.

4. We'll have quite a bit of time to see if this works or not.  This isn't a new functionality, so we can always pull it from the first RC build if necessary.
msg310960 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-01-28 15:55
It's up to the release manager, but personally it feels like you're pushing
too hard. Since you've done that frequently in the past I think we should
help you by declining your request. 3.8 is right around the corner (on a
Python release timescale) and this will give people a nice incentive to
upgrade. Subsequent betas exist to stabilize the release, not to put in
more new stuff.

Now maybe we'll need to come up with a different way to do releases, but
that's more a python-dev discussion.
msg310961 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-01-28 15:57
> It's up to the release manager, but personally it feels like you're pushing
too hard. Since you've done that frequently in the past I think we should
help you by declining your request.

NP, I totally understand.

> Now maybe we'll need to come up with a different way to do releases, but
that's more a python-dev discussion.

That would be great.
msg310965 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-01-28 16:36
I concur with Guido and Nick; let's target this for 3.8.
msg311028 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-28 22:53
> Would it be OK if we add this to 3.7beta2?

IMHO we need more time than beta1=>beta2 window to review properly the design, test the implementation, fix any potential regression, etc.

Such change is perfect for the *start of a development cycle, but IMHO it's bad when are close to the *end*.

beta1=>final is supposed to only be made of bugfixes, not introducing new bugs... (I bet that adding any feature fatally introduces new bugs)
msg344418 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-06-03 12:31
New changeset 91234a16367b56ca03ee289f7c03a34d4cfec4c8 by Inada Naoki in branch 'master':
bpo-26219: per opcode cache for LOAD_GLOBAL (GH-12884)
https://github.com/python/cpython/commit/91234a16367b56ca03ee289f7c03a34d4cfec4c8
msg344419 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-06-03 12:39
I committed cache only for LOAD_GLOBAL, which is much simpler than
LOAD_ATTR or LOAD_METHOD.

Caches for other instructions will be implemented 3.9 or later.
msg344421 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-06-03 13:34
New changeset 395420e2a35937fa9777fc3c8b0086629db95e27 by Inada Naoki in branch 'master':
bpo-26219: remove unused code (GH-13775)
https://github.com/python/cpython/commit/395420e2a35937fa9777fc3c8b0086629db95e27
msg344470 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-03 20:35
The opcode cache introduced an issue in reference leak hunting: see bpo-37146.
msg344587 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 15:08
New changeset ea9f168957f7236aae2185f65e0dc608a9a7a37b by Victor Stinner in branch 'master':
bpo-26219: Fix compiler warning in _PyCode_InitOpcache() (GH-13809)
https://github.com/python/cpython/commit/ea9f168957f7236aae2185f65e0dc608a9a7a37b
msg344591 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 15:15
I understand that the initial issue (add an opcode cache) has been implemented. Well done Yury and INADA-san!

Please open new issues for follow-up like the new optimization ideas ;-)
msg345300 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-12 02:41
New changeset 376ce9852eec4e97745c723f0dd0fe64383c6cd3 by Victor Stinner in branch 'master':
bpo-26219: Fix compiler warning in _PyCode_InitOpcache() (GH-13997)
https://github.com/python/cpython/commit/376ce9852eec4e97745c723f0dd0fe64383c6cd3
msg345315 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-12 10:15
New changeset 996e52623af3854552d41751e0c2522bc0a7e84f by Victor Stinner (Miss Islington (bot)) in branch '3.8':
bpo-26219: Fix compiler warning in _PyCode_InitOpcache() (GH-13997) (GH-14000)
https://github.com/python/cpython/commit/996e52623af3854552d41751e0c2522bc0a7e84f
msg354089 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2019-10-07 12:25
"What's new in Python 3.8" says that this change speeds up the LOAD_GLOBAL opcode by 40%.
That seems meaningless as a program cannot have LOAD_GLOBAL in isolation so any test would have other bytecodes as well.

What's the evidence for the claimed speedup?
What's the speedup on the benchmark suite?
msg354090 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-07 12:28
> What's the evidence for the claimed speedup?
> What's the speedup on the benchmark suite?

https://github.com/python/cpython/pull/12884#issuecomment-485091040
points to:
https://github.com/methane/sandbox/tree/master/2019/opcache_load_global#opcache-for-load_global

== Microbenchmark ==

$ master/python -m perf timeit -s 'def foo(): int; str; bytes; float; int; str; bytes; float' -- 'foo()'
.....................
Mean +- std dev: 213 ns +- 5 ns

$ opcache/python -m perf timeit -s 'def foo(): int; str; bytes; float; int; str; bytes; float' -- 'foo()'
.....................
Mean +- std dev: 120 ns +- 2 ns

== pyperformance ==

$ ./cpython/python -m perf compare_to master.json opcache_load_global.json -G  --min-speed=2
Slower (2):
- pickle: 19.1 us +- 0.2 us -> 19.7 us +- 0.8 us: 1.03x slower (+3%)
- unpickle_list: 8.66 us +- 0.04 us -> 8.85 us +- 0.06 us: 1.02x slower (+2%)

Faster (23):
- scimark_lu: 424 ms +- 22 ms -> 384 ms +- 4 ms: 1.10x faster (-9%)
- regex_compile: 359 ms +- 4 ms -> 330 ms +- 1 ms: 1.09x faster (-8%)
- django_template: 250 ms +- 3 ms -> 231 ms +- 2 ms: 1.08x faster (-8%)
- unpickle_pure_python: 802 us +- 12 us -> 754 us +- 9 us: 1.06x faster (-6%)
- pickle_pure_python: 1.04 ms +- 0.01 ms -> 991 us +- 15 us: 1.05x faster (-5%)
- hexiom: 20.8 ms +- 0.2 ms -> 19.8 ms +- 0.1 ms: 1.05x faster (-5%)
- logging_simple: 18.4 us +- 0.2 us -> 17.6 us +- 0.2 us: 1.05x faster (-4%)
- sympy_expand: 774 ms +- 5 ms -> 741 ms +- 3 ms: 1.04x faster (-4%)
- json_dumps: 28.1 ms +- 0.2 ms -> 27.0 ms +- 0.2 ms: 1.04x faster (-4%)
- logging_format: 20.4 us +- 0.2 us -> 19.6 us +- 0.3 us: 1.04x faster (-4%)
- richards: 147 ms +- 2 ms -> 141 ms +- 1 ms: 1.04x faster (-4%)
- meteor_contest: 189 ms +- 1 ms -> 182 ms +- 1 ms: 1.04x faster (-4%)
- xml_etree_iterparse: 226 ms +- 2 ms -> 217 ms +- 2 ms: 1.04x faster (-4%)
- sympy_str: 358 ms +- 3 ms -> 345 ms +- 4 ms: 1.04x faster (-4%)
- sqlalchemy_imperative: 44.0 ms +- 1.2 ms -> 42.4 ms +- 1.2 ms: 1.04x faster (-4%)
- sympy_sum: 167 ms +- 1 ms -> 161 ms +- 1 ms: 1.04x faster (-4%)
- nqueens: 217 ms +- 1 ms -> 211 ms +- 1 ms: 1.03x faster (-3%)
- fannkuch: 1.09 sec +- 0.01 sec -> 1.07 sec +- 0.00 sec: 1.03x faster (-3%)
- raytrace: 1.11 sec +- 0.02 sec -> 1.08 sec +- 0.01 sec: 1.03x faster (-3%)
- dulwich_log: 122 ms +- 1 ms -> 119 ms +- 1 ms: 1.03x faster (-3%)
- logging_silent: 419 ns +- 5 ns -> 410 ns +- 5 ns: 1.02x faster (-2%)
- sympy_integrate: 33.5 ms +- 0.1 ms -> 32.8 ms +- 0.2 ms: 1.02x faster (-2%)
- pathlib: 40.8 ms +- 0.4 ms -> 40.0 ms +- 0.5 ms: 1.02x faster (-2%)

Benchmark hidden because not significant (32): (...)
msg354091 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2019-10-07 12:41
Given that
def foo(): int; str; bytes; float; int; str; bytes; float
can be trivially be rewritten as
def foo(): pass
I think that benchmark is meaningless.

I really don't think we should be making claims like "40% speedup" for such contrived examples.

It looks like the speedup is real, but only about 2% speedup overall.
The "what's new" should reflect that, IMO.
msg354122 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2019-10-07 17:14
I personally think it would be fine to change the wording to say "measurable speed-up" and not attribute a specific number.
msg354181 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-10-08 09:51
On Mon, Oct 7, 2019 at 9:41 PM Mark Shannon <report@bugs.python.org> wrote:
>
> Mark Shannon <mark@hotpy.org> added the comment:
>
> Given that
> def foo(): int; str; bytes; float; int; str; bytes; float
> can be trivially be rewritten as
> def foo(): pass
> I think that benchmark is meaningless.
>

Do you mean every microbenchmark measuring single feature is meaningless?

I think it is worth enough for readers who understand what
`LOAD_GLOBAL` instruction means.
(e.g. People who using techniques like `int_ = int` before a loop.)
History
Date User Action Args
2022-04-11 14:58:26adminsetgithub: 70407
2019-12-27 19:07:14eric.snowsetnosy: + eric.snow
2019-10-08 09:51:06methanesetmessages: + msg354181
2019-10-07 17:14:46brett.cannonsetmessages: + msg354122
2019-10-07 12:41:00Mark.Shannonsetmessages: + msg354091
2019-10-07 12:28:06vstinnersetmessages: + msg354090
2019-10-07 12:25:18Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg354089
2019-06-12 10:15:13vstinnersetmessages: + msg345315
2019-06-12 02:41:31miss-islingtonsetpull_requests: + pull_request13863
2019-06-12 02:41:22vstinnersetmessages: + msg345300
2019-06-12 02:04:11vstinnersetpull_requests: + pull_request13860
2019-06-04 15:15:23vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg344591

stage: patch review -> resolved
2019-06-04 15:08:37vstinnersetmessages: + msg344587
2019-06-04 14:41:50vstinnersetpull_requests: + pull_request13693
2019-06-03 20:35:16vstinnersetmessages: + msg344470
2019-06-03 13:34:34methanesetmessages: + msg344421
2019-06-03 12:55:09methanesetpull_requests: + pull_request13662
2019-06-03 12:39:50methanesetmessages: + msg344419
2019-06-03 12:31:12methanesetmessages: + msg344418
2019-04-20 10:48:04methanesetpull_requests: + pull_request12809
2018-01-28 22:53:44vstinnersetmessages: + msg311028
2018-01-28 16:36:00ned.deilysetmessages: + msg310965
2018-01-28 15:57:36yselivanovsetmessages: + msg310961
2018-01-28 15:55:03gvanrossumsetmessages: + msg310960
2018-01-28 15:39:51yselivanovsetnosy: + ned.deily, methane
messages: + msg310956
2018-01-27 06:36:25ncoghlansetmessages: + msg310845
versions: + Python 3.8, - Python 3.6
2017-05-25 12:29:01Jim Fasarakis-Hilliardsetnosy: + Jim Fasarakis-Hilliard
2016-09-23 20:39:16Roy Williamssetnosy: + Roy Williams
2016-09-13 22:56:29Ryan Maysetnosy: + Ryan May
2016-05-15 10:32:36jstasiaksetnosy: + jstasiak
2016-05-03 20:40:36josh.rsetnosy: + josh.r
2016-05-02 21:45:04gvanrossumsetmessages: + msg264676
2016-05-02 21:38:31yselivanovsetmessages: + msg264674
2016-05-02 21:13:15gvanrossumsetmessages: + msg264671
2016-05-02 19:46:08yselivanovsetmessages: + msg264664
2016-05-02 18:45:35gvanrossumsetmessages: + msg264660
2016-05-02 18:43:41gvanrossumsetmessages: + msg264658
2016-04-29 23:06:31yselivanovsetfiles: + opcache3.patch

messages: + msg264534
2016-04-29 22:51:19gvanrossumsetmessages: + msg264533
2016-04-29 22:43:57yselivanovsetmessages: + msg264532
2016-04-29 22:14:43gvanrossumsetnosy: + gvanrossum
messages: + msg264528
2016-03-29 20:21:07vstinnersetmessages: + msg262623
2016-02-04 00:46:02yselivanovsetmessages: + msg259523
2016-02-03 22:15:59francismbsetnosy: + francismb
messages: + msg259522
2016-02-02 18:20:13yselivanovsetmessages: + msg259410
2016-02-02 08:47:08vstinnersetmessages: + msg259371
2016-02-01 19:18:12yselivanovsethgrepos: + hgrepo333
2016-02-01 19:04:08yselivanovsetfiles: + opcache2.patch

messages: + msg259344
2016-01-27 21:43:11yselivanovsetmessages: + msg259065
2016-01-27 21:36:24vstinnersetmessages: + msg259064
2016-01-27 20:45:38yselivanovsetmessages: + msg259060
2016-01-27 20:01:58brett.cannonsetmessages: + msg259052
2016-01-27 19:58:30yselivanovsetmessages: + msg259051
2016-01-27 18:34:44yselivanovsetdependencies: + PEP 509: Add ma_version to PyDictObject
2016-01-27 18:34:14yselivanovsetdependencies: + Speedup method calls 1.2x
2016-01-27 18:33:23yselivanovsetnosy: + ncoghlan, vstinner
2016-01-27 18:27:27yselivanovsetfiles: + opcache1.patch
keywords: + patch
messages: + msg259042
2016-01-27 18:15:00brett.cannonsetnosy: + brett.cannon
messages: + msg259040
2016-01-27 17:11:27yselivanovcreate