Message 51541 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ag6502
Recipients
Date	2006-12-23.22:52:35
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
I was already changing the code in the direction you're pointing out. I attached the current version (still not complete!) of the patch; more specifically: - Cached lookups are now stored in an array of structures instead than in two separate arrays - Macros are used for handling of cache entries - The timestamp is now unsigned and there is optional support for 64-bit timestamps - Support for stalling timestamping at 32 bit wrap point - Removed the LOAD_FAST_HACK (was just an experiment with oprofile; was left there by mistake) - Fixed memory allocation failures handling in code object creation - Removed exporting of timestamp value at the python level - Used as cache size the largest LOAD_ATTR/LOAD_GLOBAL entry found Still missing are the sorting of co_names to pack cache slots and the cache reset at gc time if the dictionary timestamp is stalled. The cached lookup can be used at two levels: -DCACHED_LOOKUPS enables caching of LOAD_GLOBAL results -DCACHED_MODULE_LOOKUPS (in addition to -DCACHED_LOOKUPS) also caches the result of a lookup in a module. Speed results are sometimes very interesting and sometimes strange; I found measurable differences just moving around statements between logically equivalent places after checking what gcc was doing with register allocation (probably those places were indeed not equivalent if taking in account aliasing that isn't going to happen but that a c compiler must assume as possible). I have no idea if the speedups I've measured are better or worse on other processors/architectures. File Added: cached_lookups_10.patch

I was already changing the code in the direction you're pointing out. I attached the current version (still not complete!) of the patch; more specifically:
- Cached lookups are now stored in an array of structures instead than in two separate arrays
- Macros are used for handling of cache entries
- The timestamp is now unsigned and there is optional support for 64-bit timestamps
- Support for stalling timestamping at 32 bit wrap point
- Removed the LOAD_FAST_HACK (was just an experiment with oprofile; was left there by mistake)
- Fixed memory allocation failures handling in code object creation
- Removed exporting of timestamp value at the python level
- Used as cache size the largest LOAD_ATTR/LOAD_GLOBAL entry found

Still missing are the sorting of co_names to pack cache slots and
the cache reset at gc time if the dictionary timestamp is stalled.

The cached lookup can be used at two levels:
-DCACHED_LOOKUPS enables caching of LOAD_GLOBAL results
-DCACHED_MODULE_LOOKUPS (in addition to -DCACHED_LOOKUPS) also caches the result of a lookup in a module.

Speed results are sometimes very interesting and sometimes strange; I found measurable differences just moving around statements between logically equivalent places after checking what gcc was doing with register allocation (probably those places were indeed not equivalent if taking in account aliasing that isn't going to happen but that a c compiler must assume as possible). I have no idea if the speedups I've measured are better or worse on other processors/architectures.

File Added: cached_lookups_10.patch

History
Date	User	Action	Args
2007-08-23 15:55:45	admin	link	issue1616125 messages
2007-08-23 15:55:45	admin	create