Message 282071 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	methane
Recipients	methane, serhiy.storchaka, vstinner
Date	2016-11-30.10:08:00
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1480500481.1.0.711914034845.issue28832@psf.upfronthosting.co.za>
In-reply-to

Content
> I think that clearing 120 bytes at a time is faster than clear it later entry-by-entry. Ah, my word was wrong. This patch skips zero clear entirely. In pseudo code: // When allocating PyDictKeyObject. - memset(dk_entries, 0, sizeof(dk_entries)); // When inserting new item. n = dk_nentries++ e = &dk_entries[dk_nentries++]; e->me_hash = hash; e->me_key = key; if (split_table) { + e->me_value = NULL; ma_values[n] = value; } else { e->me_value = value; } > Your patch removes some asserts, this looks not good. This patch fills dk_entries with 0xcc when Py_DEBUG is enabled. It can find unintentional access to value which comes from reused memory. I'll search more points I can insert effective asserts. > Could your provide microbenchmarks that show the largest speed up and the largest slow down? So we would see what type of code gets the benefit. Avoiding cache pollution is more important than avoiding 120byte memset in this case. It's difficult to write simple micro benchmark to show effects of cache pollution... $ ./python-patched -m perf timeit --rigorous --compare-to `pwd`/python-default --duplicate 8 -- '{}' python-default: ......................................... 44.6 ns +- 2.4 ns python-patched: ......................................... 44.1 ns +- 1.8 ns Median +- std dev: [python-default] 44.6 ns +- 2.4 ns -> [python-patched] 44.1 ns +- 1.8 ns: 1.01x faster

> I think that clearing 120 bytes at a time is faster than clear it later entry-by-entry.

Ah, my word was wrong. This patch skips zero clear entirely.
In pseudo code:

  // When allocating PyDictKeyObject.
- memset(dk_entries, 0, sizeof(dk_entries));

  // When inserting new item.
  n = dk_nentries++
  e = &dk_entries[dk_nentries++];
  e->me_hash = hash;
  e->me_key = key;
  if (split_table) {
+     e->me_value = NULL;
      ma_values[n] = value;
  } else {
      e->me_value = value;
  }


> Your patch removes some asserts, this looks not good.

This patch fills dk_entries with 0xcc when Py_DEBUG is enabled.
It can find unintentional access to value which comes from reused memory.

I'll search more points I can insert effective asserts.


> Could your provide microbenchmarks that show the largest speed up and the largest slow down? So we would see what type of code gets the benefit.

Avoiding cache pollution is more important than avoiding 120byte memset in this case.
It's difficult to write simple micro benchmark to show effects of cache pollution...

$ ./python-patched -m perf timeit --rigorous --compare-to `pwd`/python-default --duplicate 8 -- '{}'
python-default: ......................................... 44.6 ns +- 2.4 ns
python-patched: ......................................... 44.1 ns +- 1.8 ns
Median +- std dev: [python-default] 44.6 ns +- 2.4 ns -> [python-patched] 44.1 ns +- 1.8 ns: 1.01x faster

History
Date	User	Action	Args
2016-11-30 10:08:01	methane	set	recipients: + methane, vstinner, serhiy.storchaka
2016-11-30 10:08:01	methane	set	messageid: <1480500481.1.0.711914034845.issue28832@psf.upfronthosting.co.za>
2016-11-30 10:08:01	methane	link	issue28832 messages
2016-11-30 10:08:00	methane	create