Message282071
> I think that clearing 120 bytes at a time is faster than clear it later entry-by-entry.
Ah, my word was wrong. This patch skips zero clear entirely.
In pseudo code:
// When allocating PyDictKeyObject.
- memset(dk_entries, 0, sizeof(dk_entries));
// When inserting new item.
n = dk_nentries++
e = &dk_entries[dk_nentries++];
e->me_hash = hash;
e->me_key = key;
if (split_table) {
+ e->me_value = NULL;
ma_values[n] = value;
} else {
e->me_value = value;
}
> Your patch removes some asserts, this looks not good.
This patch fills dk_entries with 0xcc when Py_DEBUG is enabled.
It can find unintentional access to value which comes from reused memory.
I'll search more points I can insert effective asserts.
> Could your provide microbenchmarks that show the largest speed up and the largest slow down? So we would see what type of code gets the benefit.
Avoiding cache pollution is more important than avoiding 120byte memset in this case.
It's difficult to write simple micro benchmark to show effects of cache pollution...
$ ./python-patched -m perf timeit --rigorous --compare-to `pwd`/python-default --duplicate 8 -- '{}'
python-default: ......................................... 44.6 ns +- 2.4 ns
python-patched: ......................................... 44.1 ns +- 1.8 ns
Median +- std dev: [python-default] 44.6 ns +- 2.4 ns -> [python-patched] 44.1 ns +- 1.8 ns: 1.01x faster |
|
Date |
User |
Action |
Args |
2016-11-30 10:08:01 | methane | set | recipients:
+ methane, vstinner, serhiy.storchaka |
2016-11-30 10:08:01 | methane | set | messageid: <1480500481.1.0.711914034845.issue28832@psf.upfronthosting.co.za> |
2016-11-30 10:08:01 | methane | link | issue28832 messages |
2016-11-30 10:08:00 | methane | create | |
|