Message78925
> I'm not an expert in this kind of optimizations. Could we gain more
speed by making the dispatcher table more dense? Python has less than
128 opcodes (len(opcode.opmap) == 113) so they can be squeezed in a
smaller table. I naively assume a smaller table increases the amount of
cache hits.
Well, you have no binary compatibility constraint with a new release, so
it can be tried and benchmarked, or it can be done anyway!
On x86_64 the impact of the jump table is 8 bytes per pointer * 256
pointers = 2KiB, and the L1 data cache of Pentium4 can be 8KiB or 16KiB
wide.
But I don't expect this to be noticeable in most synthetic
microbenchmarks. Matrix multiplication would be the perfect one I guess;
the repeated column access would kill the L1 data cache, if the whole
matrixes don't fit. |
|
Date |
User |
Action |
Args |
2009-01-03 02:20:58 | blaisorblade | set | recipients:
+ blaisorblade, lemburg, skip.montanaro, rhettinger, pitrou, christian.heimes, alexandre.vassalotti |
2009-01-03 02:20:58 | blaisorblade | set | messageid: <1230949258.6.0.586239956138.issue4753@psf.upfronthosting.co.za> |
2009-01-03 02:20:58 | blaisorblade | link | issue4753 messages |
2009-01-03 02:20:57 | blaisorblade | create | |
|