Message 121156 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	mark.dickinson, pitrou, rhettinger, tim.peters
Date	2010-11-13.23:12:16
SpamBayes Score	5.5100986e-06
Marked as misclassified	No
Message-id	<1289689933.3561.7.camel@localhost.localdomain>
In-reply-to	<1289689354.8.0.363857776985.issue10408@psf.upfronthosting.co.za>

Content
> My previous experiments along these lines showed it was a dead-end. > The number of probes was the most important factor and beat-out any > effort to improve cache utilization from increased density. Can you describe your experiments? What workloads or benchmarks did you use? Do note that there are several levels of caches in modern CPUs. L1 is very fast (latency is 3 or 4 cycles) but rather small (32 or 64KB). L2, depending on the CPU, has a latency between 10 and 20+ cycles and can be 256KB to 1MB large. L3, when present, is quite larger but also quite slower (latency sometimes up to 50 cycles). So, even if access patterns are uneven, it is probably rare to have all frequently accessed data in L1 (especially with Python since objects are big). > Another result from earlier experiments is that benchmarking the > experiment is laden with pitfalls. Tight timing loops don't mirror > real world programs, nor do access patterns with uniform random > distributions. I can certainly understand that; can you suggest workloads approaching "real world programs"?

> My previous experiments along these lines showed it was a dead-end.
> The number of probes was the most important factor and beat-out any
> effort to improve cache utilization from increased density.  

Can you describe your experiments? What workloads or benchmarks did you
use?

Do note that there are several levels of caches in modern CPUs. L1 is
very fast (latency is 3 or 4 cycles) but rather small (32 or 64KB). L2,
depending on the CPU, has a latency between 10 and 20+ cycles and can be
256KB to 1MB large. L3, when present, is quite larger but also quite
slower (latency sometimes up to 50 cycles).
So, even if access patterns are uneven, it is probably rare to have all
frequently accessed data in L1 (especially with Python since objects are
big).

> Another result from earlier experiments is that benchmarking the
> experiment is laden with pitfalls.  Tight timing loops don't mirror
> real world programs, nor do access patterns with uniform random
> distributions.

I can certainly understand that; can you suggest workloads approaching
"real world programs"?

History
Date	User	Action	Args
2010-11-13 23:12:19	pitrou	set	recipients: + pitrou, tim.peters, rhettinger, mark.dickinson
2010-11-13 23:12:16	pitrou	link	issue10408 messages
2010-11-13 23:12:16	pitrou	create