classification
Title: Implement LOAD_GLOBAL opcode cache
Type: performance Stage: patch review
Components: Interpreter Core Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: yselivanov Nosy List: brett.cannon, haypo, inada.naoki, serhiy.storchaka, yselivanov, ztane
Priority: normal Keywords: patch

Created on 2016-09-14 20:48 by yselivanov, last changed 2016-12-20 11:20 by inada.naoki.

Files
File name Uploaded Description Edit
load_globals_cache.patch yselivanov, 2016-09-14 20:48 review
Messages (13)
msg276485 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-09-14 20:48
The attached patch implements an opcode cache for LOAD_GLOBAL opcode, making it 2x faster.

The idea is to use the new co_extra field of code objects & PEP 509 to attach a cache structure pointing directly to the resolved global name.  When globals or builtins are updated, the cache is invalidated.

The patch adds a new file "ceval_cache.h" which provides some infrastructure to ease the pain of implementing new caches for other opcode types.  I can strip down all macros from that file if Victor finds it too hard to review.

Here's a simple script I used to make sure that the cache is working: https://gist.github.com/1st1/a9660aabdcf6b8bc6205b9fe39a82bba

You can also set OPCACHE_COLLECT_STATS to 1 to get debug stats output for the cache.
msg276519 - (view) Author: Antti Haapala (ztane) * Date: 2016-09-15 06:03
I just noticed that you're actually testing a builtin instead of something in just module globals. How is the performance with module globals?
msg276520 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-15 06:39
Does this prolongate the lifetime of cached global objects?
msg276521 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-09-15 06:40
> Does this prolongate the lifetime of cached global objects?

No.  I store borrowed references.  The idea is that if the state of globals/builtins dict has changed, we invalidate the cache.
msg276522 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-09-15 06:43
Serhiy, feel free to review the patch.  Guido and Ned okayed it to be committed before 3.6b2.

Currently the patch only optimizes one opcode -- LOAD_GLOBAL, but cveal_cache.h file provides the infrastructure to easily implement more opcode caches.  I can simplify that file (remove all macros) if it's too hard to review it.
msg276645 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-09-16 00:05
I'm going to commit this patch tomorrow.
msg276647 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-09-16 00:14
Can you please post results of the performance benchmark suite?

If you give me more time, I can review the patch.
msg276697 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-16 11:29
There is too much magic in ceval_cache.h. Since the patch adds caching only for the LOAD_GLOBAL opcode, it would much clearer if write functions only for the LOAD_GLOBAL opcode, without multilayer macros.

What will happen if there more than 255 LOAD_GLOBALs. Wouldn't 256th LOAD_GLOBAL return incorrect cached value for the the LOAD_GLOBAL (with index 0)?
msg276703 - (view) Author: Antti Haapala (ztane) * Date: 2016-09-16 11:49
I wouldn't actually consider the builtin lookup speed almost at all, in all non-trivial applications the ratio of builtins to any other names will diminish; and if there is a tight loop, it is possible to always speed it up by `min = min` if you do not need to support monkey patching.
msg283616 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-12-19 12:24
I'll update this patch and #10401 and then run benchmark suite when I have time.

As far as I look quickly, #10401 uses namei as cache key
and this patch uses opcode index as cache key, am I right?
msg283629 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-12-19 16:14
> As far as I look quickly, #10401 uses namei as cache key
and this patch uses opcode index as cache key, am I right?

Correct.  Actually, if you don't mind, I'd like to update the patch myself. I have a few ideas how to restructure it and add support for LOAD_ATTR.
msg283662 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-12-20 03:34
>  Actually, if you don't mind, I'd like to update the patch myself. I have a few ideas how to restructure it and add support for LOAD_ATTR.

Sounds interesting.
I'll wait to touch this patch.  There are enough time by Python 3.7
and enough other issues I can work.
msg283681 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2016-12-20 11:20
memo: http://bugs.python.org/issue26219 may be relating to this.
History
Date User Action Args
2016-12-20 11:20:53inada.naokisetmessages: + msg283681
2016-12-20 03:34:54inada.naokisetmessages: + msg283662
2016-12-19 16:14:50yselivanovsetmessages: + msg283629
2016-12-19 12:24:09inada.naokisetpriority: critical -> normal
nosy: + inada.naoki
messages: + msg283616

2016-09-16 11:49:06ztanesetmessages: + msg276703
2016-09-16 11:29:26serhiy.storchakasetmessages: + msg276697
2016-09-16 00:14:29hayposetmessages: + msg276647
2016-09-16 00:05:32yselivanovsetmessages: + msg276645
2016-09-15 06:43:41yselivanovsetmessages: + msg276522
2016-09-15 06:40:39yselivanovsetmessages: + msg276521
2016-09-15 06:39:13serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg276520
versions: + Python 3.7, - Python 3.6
2016-09-15 06:03:56ztanesetnosy: + ztane
messages: + msg276519
2016-09-14 21:16:36gvanrossumsetnosy: - gvanrossum
2016-09-14 21:14:58brett.cannonsetnosy: + brett.cannon
2016-09-14 20:48:00yselivanovcreate