This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author yselivanov
Recipients brett.cannon, francismb, gvanrossum, ncoghlan, vstinner, yselivanov
Date 2016-05-02.19:46:07
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1462218368.99.0.666324440124.issue26219@psf.upfronthosting.co.za>
In-reply-to
Content
> I'm confused by the relationship between this and issue 26110.


This patch embeds the implementation of 26110.  I'm no longer sure it was a good idea to have two issues instead of one, everybody seems to be confused about that ;)


> That seems to be a much simpler patch (which also doesn't apply cleanly). If 26110 really increases method calls by 20%, what does this add? (By which I mean (a) what additional optimizations does it have, and (b) what additional speedup does it have?)


I'm sorry for the long response, please bare with me.  This issue is complex, and it's very hard to explain it all in a short message.


The patch from 26110 implements LOAD_METHOD/CALL_METHOD pair of opcodes.  The idea is that we can avoid instantiation of BoundMethod objects for code that looks like "something.method(...)'.  I wanted to first get in shape the patch from 26110, commit it, and then, use the patch from this issue to add additional speedups.


This patch implements a generic per-opcode cache.  Using that cache, it speeds up LOAD_GLOBAL, LOAD_ATTR, and LOAD_METHOD (from 26110) opcodes.  The cache works on per-code object basis, only optimizing code objects that were run more than 1,000 times.


* LOAD_GLOBAL uses cache to store pointers for requested names.  Since the cache is per-opcode, the name is always the same for the given LOAD_GLOBAL.  The cache logic uses PEP 509 to invalidate the cache (although the cache is almost never invalidated for real code).

This optimization makes micro optimizations like "def smth(len=len)" obsolete.  LOAD_GLOBAL becomes much faster, almost as fast as LOAD_FAST.


* LOAD_ATTR uses Armin Rigo's clever types cache, and a modified PyDict_GetItem (PyDict_GetItemHint), which accepts a suggested position of the value in the hash table.  Basically, LOAD_ATTR stores in its cache a pointer to the type of the object it works with, its tp_version_tag, and a hint for PyDict_GetItemHint.  When we have a cache hit, LOAD_ATTR becomes super fast, since it only needs to lookup key/value in type's dict by a known offset (the real code is a bit more complex, to handle all edge cases of descriptor protocol etc).

Python programs have a lot of LOAD_ATTRs.  It also seems that dicts of classes are usually stable, and that objects rarely shadow class attributes.


* LOAD_METHOD is optimized very similarly to LOAD_ATTR.  The idea is that if the opcode is optimized, then we simply store a pointer to the function we want to call with CALL_METHOD.  Since 'obj.method' usually implies that 'method' is implemented on 'obj.__class__' this optimization makes LOAD_METHOD even faster.  That's how `s.startswith('abc')` becomes as fast as `s[:3] == 'abc'`.


> If 26110 really increases method calls by 20%, what does this add? 

Speeds up method calls another 15%, speeds up global name lookups and attribute lookups.


> (b) what additional speedup does it have

Here're some benchmark results: https://gist.github.com/1st1/b1978e17ee8b82cc6432

- call_method micro benchmark is 35% faster
- 2to3 is 7-8% faster
- richards - 18% faster
- many other benchmarks are 10-15% faster.  Those that appear slower aren't stable, i.e. one run they are slower, another they are faster.

I'd say each of the above optimizations speeds up macro-benchmarks by 2-4%.  Combined, they speed up CPython 7-15%.


> I'm asking because I'm still trying to look for reasons why I should accept PEP 509, and this is brought up as a reason.

Re PEP 509 and these patches:

1. I want to first find time to finish up and commit 26110.
2. Then I'll do some careful benchmarking for this patch and write another update with results.
3. This patch adds to the complexity of ceval, but if it really speeds up CPython it's well worth it.  PEP 509 will need to be approved if we decide to move forward with this patch.

I'd say that if you aren't sure about PEP 509 right now, then we can wait a couple of months and decide later.
History
Date User Action Args
2016-05-02 19:46:09yselivanovsetrecipients: + yselivanov, gvanrossum, brett.cannon, ncoghlan, vstinner, francismb
2016-05-02 19:46:08yselivanovsetmessageid: <1462218368.99.0.666324440124.issue26219@psf.upfronthosting.co.za>
2016-05-02 19:46:08yselivanovlinkissue26219 messages
2016-05-02 19:46:07yselivanovcreate