Message 78688 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	blaisorblade
Recipients	alexandre.vassalotti, arigo, blaisorblade, christian.heimes, lemburg, pitrou, rhettinger, skip.montanaro
Date	2009-01-01.04:36:55
SpamBayes Score	4.9847625e-08
Marked as misclassified	No
Message-id	<1230784617.49.0.50655991643.issue4753@psf.upfronthosting.co.za>
In-reply-to

Content
> You may want to check out issue1408710 in which a similar patch was > provided, but failed to deliver the desired results. It's not really similar, because you don't duplicate the dispatch code. It took me some time to understand why you didn't change the "goto fast_next_opcode", but that's where you miss the speedup. The only difference with your change is that you save the range check for the switch, so the slowdown probably comes from some minor output change from GCC I guess. Anyway, this suggests that the speedup really comes from better branch prediction and not from saving the range check. The 1st paper I mentioned simply states that saving the range check might make a small differences. The point is that sometimes, when you are going to flush the pipeline, it's like adding a few instructions, even conditional jumps, does not make a difference. I've observed this behaviour quite a few times while building from scratch a small Python interpreter. I guess (but this might be wrong) that's because the execution units were not used at their fullest, and adding conditional jumps doesn't make a differeence because flushing a pipeline once or twice is almost the same (the second flush removes just few instructions). Or something like that, I'm not expert enough of CPU architecture to be sure of such guesses.

> You may want to check out issue1408710 in which a similar patch was
> provided, but failed to deliver the desired results.
It's not really similar, because you don't duplicate the dispatch code.
It took me some time to understand why you didn't change the "goto
fast_next_opcode", but that's where you miss the speedup.

The only difference with your change is that you save the range check
for the switch, so the slowdown probably comes from some minor output
change from GCC I guess.

Anyway, this suggests that the speedup really comes from better branch
prediction and not from saving the range check. The 1st paper I
mentioned simply states that saving the range check might make a small
differences. The point is that sometimes, when you are going to flush
the pipeline, it's like adding a few instructions, even conditional
jumps, does not make a difference. I've observed this behaviour quite a
few times while building from scratch a small Python interpreter.

I guess (but this might be wrong) that's because the execution units
were not used at their fullest, and adding conditional jumps doesn't
make a differeence because flushing a pipeline once or twice is almost
the same (the second flush removes just few instructions). Or something
like that, I'm not expert enough of CPU architecture to be sure of such
guesses.

History
Date	User	Action	Args
2009-01-01 04:36:57	blaisorblade	set	recipients: + blaisorblade, lemburg, skip.montanaro, arigo, rhettinger, pitrou, christian.heimes, alexandre.vassalotti
2009-01-01 04:36:57	blaisorblade	set	messageid: <1230784617.49.0.50655991643.issue4753@psf.upfronthosting.co.za>
2009-01-01 04:36:56	blaisorblade	link	issue4753 messages
2009-01-01 04:36:55	blaisorblade	create