Issue1560
Created on 2007-12-05 21:00 by ntoronto, last changed 2007-12-10 21:22 by rhettinger.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | Remove |
| fastattr-0.patch.txt | ntoronto, 2007-12-05 21:00 | |||
| Messages (6) | |||
|---|---|---|---|
| msg58229 - (view) | Author: Neil Toronto (ntoronto) | Date: 2007-12-05 21:00 | |
I've attached a patch to accelerate type and instance lookups using a specialized cache. It includes a benchmarking script (fastattr_test.py) that tests both successful and failing lookups on list, dict and tuple, and a simple, deep hierarchy. Benchmark results are here: http://spreadsheets.google.com/ccc?key=pHIJrYc_pnIUpTm6QSG2gZg&hl=en_US Everything tested in fastattr_test.py is faster except list.__init__ and list().__init__ (and I haven't got a clue why, so some pointers would be nice). Pybench is faster overall. TryRaiseExcept is faster for some non-obvious reason. CreateNewInstances is a little slower, which I'll discuss in a bit. Setting type attributes is slower, but I haven't benchmarked that yet. It may not happen often enough that we care as long as it's not noticeably slow in general usage. In benchmarks the patch does a little slower, it may be in part because I removed a manually inlined _PyType_Lookup from PyObject_GenericGetAttr. Something like it can be put back if it needs to be. It works in a fairly obvious way. Every type has a tp_cache, which is a custom dict type that caches the first value in a type dict in the MRO for a given name. Lazy cached lookups are done via _PyType_CachingLookup, which is a drop-in replacement for _PyType_Lookup. The only way to set an attribute on a type is via its setattr, so type's setattr notifies subclasses to invalidate specific cache entries. The cache dict is custom for two reasons: 1. The regular dict is a little slower because it's more general. The custom dict is restricted to string-exact keys (types fall back to no caching if this constraint is violated). Because it's internal to typeobject.c, it's safe for lookups to return entry pointers rather than values, so lookups only have to be done once, even on cache misses. 2. Caching missing attributes is crucial for speed on instance attribute lookups. Both type and metatype instances check all the way through the MRO for a descriptor before even trying to find an attribute. It's usually missing. Having to go through the cache machinery to find a missing attribute for every attribute lookup is expensive. However, storing all missing attribute queries in the cache is bad - it can grow unboundedly through hasattr(inst, <random name>) calls. What's needed is a dict that knows that some of its entries are transient and doesn't copy them over on resize. It wasn't clear how to implement that efficiently with a regular dict (I suspect it's not possible), so I created a new dict type that considers entries transient (meaning the attribute is missing) when me_value == NULL. This is also very good for failing hasattr(...) calls and try...inst.method()...except style duck typing. Now, about the CreateNewInstances benchmark. It creates three classes with __init__ methods that assign a few attributes. The missing descriptors are cached, and then the cache is resized, which clears the cached missing descriptors. Increasing the minimum cache size from 4 to 8 clears up the problem. However, for any class, SOME minimum cache size will properly cache missing descriptors and some other one won't. I have some solutions for this floating around in my head, which I'll try shortly. (One idea is for missing attributes, if they're not missing on the *instance*, to be permanent in the cache.) But I'd like to get this patch off my hard drive and into other people's hands for testing, poking, prodding, and getting feedback on what's going right and what's not. |
|||
| msg58355 - (view) | Author: Guido van Rossum (gvanrossum) | Date: 2007-12-10 18:05 | |
Are you withdrawing this in favor of #1568? |
|||
| msg58360 - (view) | Author: Neil Toronto (ntoronto) | Date: 2007-12-10 19:16 | |
Yes, as well #1700288 (Armin's attribute lookup caching patch ported to 2.6) or #1685986 (Armin's original for 2.4), or whatever Raymond finds most convenient. |
|||
| msg58365 - (view) | Author: Guido van Rossum (gvanrossum) | Date: 2007-12-10 19:51 | |
That's still ambiguous -- do you want any of those to be closed too? Clearly we're not going to patch 2.4. |
|||
| msg58367 - (view) | Author: Neil Toronto (ntoronto) | Date: 2007-12-10 20:03 | |
Sorry - I'll be more clear. I'd love to see 2.6 get attribute lookup caching, and Kevin's port (#1700288) of Armin's original 2.4 patch (#1685986) does that for 2.6. #1685986 (2.4) should be closed and #1700288 (2.6) should remain open. But Raymond has indicated that he's currently working on #1685986 - I think for 2.6, but it's not clear. |
|||
| msg58371 - (view) | Author: Raymond Hettinger (rhettinger) | Date: 2007-12-10 21:22 | |
I'm trying to look at all of them. Having it split into several patches and getting frequent updates and posts is making it difficult. Please communicate with me directly (python at rcn dot com) so I can find out which versions are the latest and the reason behind each variation. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2007-12-10 21:22:25 | rhettinger | set | nosy:
+ rhettinger messages: + msg58371 |
| 2007-12-10 20:03:12 | ntoronto | set | messages: + msg58367 |
| 2007-12-10 19:51:17 | gvanrossum | set | messages: + msg58365 |
| 2007-12-10 19:50:25 | gvanrossum | set | status: open -> closed resolution: out of date |
| 2007-12-10 19:16:26 | ntoronto | set | messages: + msg58360 |
| 2007-12-10 18:05:16 | gvanrossum | set | nosy:
+ gvanrossum messages: + msg58355 |
| 2007-12-05 21:45:14 | loewis | set | keywords: + patch |
| 2007-12-05 21:00:24 | ntoronto | create | |