New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement LOAD_GLOBAL opcode cache #72345
Comments
The attached patch implements an opcode cache for LOAD_GLOBAL opcode, making it 2x faster. The idea is to use the new co_extra field of code objects & PEP-509 to attach a cache structure pointing directly to the resolved global name. When globals or builtins are updated, the cache is invalidated. The patch adds a new file "ceval_cache.h" which provides some infrastructure to ease the pain of implementing new caches for other opcode types. I can strip down all macros from that file if Victor finds it too hard to review. Here's a simple script I used to make sure that the cache is working: https://gist.github.com/1st1/a9660aabdcf6b8bc6205b9fe39a82bba You can also set OPCACHE_COLLECT_STATS to 1 to get debug stats output for the cache. |
I just noticed that you're actually testing a builtin instead of something in just module globals. How is the performance with module globals? |
Does this prolongate the lifetime of cached global objects? |
No. I store borrowed references. The idea is that if the state of globals/builtins dict has changed, we invalidate the cache. |
Serhiy, feel free to review the patch. Guido and Ned okayed it to be committed before 3.6b2. Currently the patch only optimizes one opcode -- LOAD_GLOBAL, but cveal_cache.h file provides the infrastructure to easily implement more opcode caches. I can simplify that file (remove all macros) if it's too hard to review it. |
I'm going to commit this patch tomorrow. |
Can you please post results of the performance benchmark suite? If you give me more time, I can review the patch. |
There is too much magic in ceval_cache.h. Since the patch adds caching only for the LOAD_GLOBAL opcode, it would much clearer if write functions only for the LOAD_GLOBAL opcode, without multilayer macros. What will happen if there more than 255 LOAD_GLOBALs. Wouldn't 256th LOAD_GLOBAL return incorrect cached value for the the LOAD_GLOBAL (with index 0)? |
I wouldn't actually consider the builtin lookup speed almost at all, in all non-trivial applications the ratio of builtins to any other names will diminish; and if there is a tight loop, it is possible to always speed it up by |
Correct. Actually, if you don't mind, I'd like to update the patch myself. I have a few ideas how to restructure it and add support for LOAD_ATTR. |
Sounds interesting. |
memo: http://bugs.python.org/issue26219 may be relating to this. |
This is implemented in bpo-26219. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: