Message 366825 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steve.dower
Recipients	Mark.Shannon, carljm, corona10, dino.viehland, eelizondo, gregory.p.smith, nascheme, pablogsal, pitrou, shihai1991, steve.dower, tim.peters, vstinner
Date	2020-04-20.13:33:01
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1587389581.51.0.25436145899.issue40255@roundup.psfhosted.org>
In-reply-to

Content
> I would expect that the negative impact on branch predictability would easily outweigh the cost of the memory write (A guaranteed L1 hit) If that were true then Spectre and Meltdown wouldn't have been so interesting :) Pipelining processors are going to speculatively execute both paths, and will skip the write much more quickly than by doing it, and meanwhile nobody should have tried to read the value so it hasn't had to block for that path. I'm not aware of any that detect no-op writes and skip synchronising across cores - the dirty bit of the cache line is just set unconditionally. Benchmarking already showed that the branching version is faster. It's possible that "refcount += (refcount & IMMORTAL) ? 0 : 1" could generate different code (should be mov,test,lea,cmovz rather than mov,and,add,mov or mov,and,jz,add,mov), but it's totally reasonable for a branch to be faster than unconditionally modifying memory.

> I would expect that the negative impact on branch predictability would easily outweigh the cost of the memory write (A guaranteed L1 hit)

If that were true then Spectre and Meltdown wouldn't have been so interesting :)

Pipelining processors are going to speculatively execute both paths, and will skip the write much more quickly than by doing it, and meanwhile nobody should have tried to read the value so it hasn't had to block for that path. I'm not aware of any that detect no-op writes and skip synchronising across cores - the dirty bit of the cache line is just set unconditionally.

Benchmarking already showed that the branching version is faster. It's possible that "refcount += (refcount & IMMORTAL) ? 0 : 1" could generate different code (should be mov,test,lea,cmovz rather than mov,and,add,mov or mov,and,jz,add,mov), but it's totally reasonable for a branch to be faster than unconditionally modifying memory.

History
Date	User	Action	Args
2020-04-20 13:33:01	steve.dower	set	recipients: + steve.dower, tim.peters, nascheme, gregory.p.smith, pitrou, vstinner, carljm, dino.viehland, Mark.Shannon, corona10, pablogsal, eelizondo, shihai1991
2020-04-20 13:33:01	steve.dower	set	messageid: <1587389581.51.0.25436145899.issue40255@roundup.psfhosted.org>
2020-04-20 13:33:01	steve.dower	link	issue40255 messages
2020-04-20 13:33:01	steve.dower	create