Title: Move generator specific sections out of ceval.
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.4
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: benjamin.peterson Nosy List: Mark.Shannon, benjamin.peterson, eric.snow, jcea, meador.inge, ncoghlan, ron_adam, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2011-12-16 00:02 by ron_adam, last changed 2018-09-24 16:56 by serhiy.storchaka. This issue is now closed.

File name Uploaded Description Edit
f_why2.diff ron_adam, 2011-12-24 06:40 Clean up generator.c and ceval.c with better use of why codes. review
exit_code_example.c Mark.Shannon, 2012-02-13 14:42 Example code for returning exit code directly
Messages (10)
msg149583 - (view) Author: Ron Adam (ron_adam) * Date: 2011-12-16 00:02
The following changes cleanup the eval loop and result in a pretty solid 2 to 3% improvement in pybench for me.

And it is about 5% faster for long generators.

* Change why enum type to int and #defines.  And moved the why defines to opcode.h so that they can be seen by the genrator objects after a yield, return, or exception.

* Added an "int f_why" to frames so the generator can see why it returned from a send.

* Refactored generator obj so it can use the "f->f_why" to determine what to do without having to do several checks first.

* Moved the generator specific execption save/swap/and clear out of the cevel main loop.  No need to check for those on every function call.

The only test that fails is the frame size is test_sys.  I left that in for now so someone could check that, and tell me if it's ok to fix it, or if I need to do something else.

I also considered putting the why on the tstate object.  It might save some memory as there wouldn't be as many of those.
msg149591 - (view) Author: Ron Adam (ron_adam) * Date: 2011-12-16 03:48
A simple test to show the difference.


$ python3 -mtimeit "def y(n):" "  for x in range(n):" "    yield  x" "sum(y(10))"
100000 loops, best of 3: 3.87 usec per loop
$ python3 -mtimeit "def y(n):" "  for x in range(n):" "    yield  x" "sum(y(1000000))"
10 loops, best of 3: 186 msec per loop

$ ./python -mtimeit "def y(n):" "  for x in range(n):" "    yield  x" "sum(y(10))"
100000 loops, best of 3: 3.81 usec per loop
$ ./python -mtimeit "def y(n):" "  for x in range(n):" "    yield  x" "sum(y(1000000))"
10 loops, best of 3: 168 msec per loop

                       before   after
y(10)         usec's   3.87    3.81     - 1.55%   
y(1000000)    msec's   186     168      - 9.67%
msg149650 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2011-12-17 00:31
Seems mostly fine to me.
msg149821 - (view) Author: Ron Adam (ron_adam) * Date: 2011-12-19 06:04
New diff file.

The main difference is I moved the saved why value to the tstate object instead of the frame object as why_exit.

I'm not seeing the time savings now for some reason.  Maybe the previous increase was a case of coincidental noise. (?)  Still looking for other reasons though.  ;-)

Moving the generator code from the eval loop to the generator object may still be a good thing to do.
msg149953 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2011-12-21 02:10
With the new patch I see no benefits on the same micro-benchmarks you posted (it is even slower for the smaller case) on a quad-core 64-bit F15 box:


$ ./python -mtimeit "def y(n):" "  for x in range(n):" "    yield  x" "sum(y(10))"
1000000 loops, best of 3: 1.33 usec per loop
$ ./python -mtimeit "def y(n):" "  for x in range(n):" "    yield  x" "sum(y(1000000))"
10 loops, best of 3: 66 msec per loop


$ ./python -mtimeit "def y(n):" "  for x in range(n):" "    yield  x" "sum(y(10))"
1000000 loops, best of 3: 1.45 usec per loop
$ ./python -mtimeit "def y(n):" "  for x in range(n):" "    yield  x" "sum(y(1000000))"
10 loops, best of 3: 65.8 msec per loop
msg149954 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-12-21 03:10
The thing that most appeals to me with this concept is moving closer to making it possible to experiment with generator-style functionality in *extension* modules (albeit extension modules that are coupled to private CPython APIs). So, for me, "not worse than the status quo" would be the main thing I'd be looking for out of any micro-benchmarks.

However, that also makes me question the movement of the "why_exit" from the frame to the tstate - having it on the thread state is significantly less flexible when it comes to experimenting with execution models.

The move from orthogonal bit flags in a dedicated enum to int fields and macro definitions also seems like a completely unnecessary pessimisation.
msg150026 - (view) Author: Ron Adam (ron_adam) * Date: 2011-12-21 17:59
I think the time benefits I saw are dependent on how the C code is compiled.  So it may be different on different compilers or the same compiler with only a very minor change.

Some of the things I've noticed...

A switch is sometimes slower if it has a "default:" block.

Moving "why = tstate->why_exit;" to just before the if helps a tiny bit.

    why = tstate->why_exit;
    if (why != WHY_EXCEPTION) {

Returning directly out of the WHY_YIELD case.

        case WHY_YIELD:
            return result;

These will be mostly compiler dependent, but they won't slow things down any either.

What I was trying to do is clean up things a bit in ceval.c.  Having the why available on the frame can help by allowing some things to be moved out of ceval.c where it makes sense to do that.

I'll post a new diff tonight with the why_exit moved back to the frame object and the why's back to enums.  Yes, I think the frame object is a good place for it.

One of the odd issues is the opcode and why values sometimes need to be pushed on the value stack.  Which means it needs to be converted to a pyobject first, then converted back after it's pulled off the stack.  I'm looking for a way to avoid that.

I also was able to make fast_block_end section simpler and more understandable without making it slower.  I think it was trying to be too clever in order to save some lines of code.  That makes it very hard to figure out by someone else.

But first I need to finish my Christmas shopping, I'll post a new patch tonight when I get a chance. :-)
msg150216 - (view) Author: Ron Adam (ron_adam) * Date: 2011-12-24 06:40
Updated patch with suggested changes.

It also has a cleaned up fast_block_end section.

Concerning speed. What happens is (as Tim and Raymond have pointed out) that we can make some things a little faster, in exchange for other things being a little slower.  You can play around with the order of the why cases in the fast_block_end section and see that effect.

By using a switch instead of if-else's, that should result in more consistent balance between the block exit cases.  The order I currently have gives a little more priority for exceptions and that seems to help a tiny bit with the ccbench scores.  I think that is a better bench mark than the small micro tests like pybench does.  The problem with pybench is, it doesn't test deeper nesting where these particular changes will have a greater effect.
msg153275 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2012-02-13 14:42
Why do the codes used internally by ceval have to be the same as those made public?

Have you considered returning the exit code directly, rather than
adding another field to frameobject?

yieldfrom can be handled in a similar way to yield by returning the iterator (instead of the yielded value).

See attached file for example code.
msg312817 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-25 15:08
The WHY_* codes are gone in 3.8 (see issue17611). Exception state was moved from frame to generator in 3.7 (see issue25612).

Can some parts of the patch be applied to the current code? Or it is completely outdated?
Date User Action Args
2018-09-24 16:56:26serhiy.storchakasetstatus: pending -> closed
resolution: out of date
stage: patch review -> resolved
2018-02-25 15:08:25serhiy.storchakasetstatus: open -> pending
2018-02-25 15:08:13serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg312817
2012-10-15 19:24:22terry.reedysetversions: + Python 3.4, - Python 3.3
2012-02-13 14:42:15Mark.Shannonsetfiles: + exit_code_example.c
nosy: + Mark.Shannon
messages: + msg153275

2011-12-24 06:41:02ron_adamsetfiles: + f_why2.diff

messages: + msg150216
2011-12-24 06:05:28ron_adamsetfiles: - f_why1.diff
2011-12-21 18:26:14eric.snowsetnosy: + eric.snow
2011-12-21 17:59:48ron_adamsetmessages: + msg150026
2011-12-21 03:10:59ncoghlansetmessages: + msg149954
2011-12-21 02:10:37meador.ingesetmessages: + msg149953
2011-12-21 02:03:32meador.ingesetnosy: + meador.inge
2011-12-19 06:04:28ron_adamsetfiles: + f_why1.diff

messages: + msg149821
2011-12-19 05:28:50ron_adamsetfiles: - f_why.diff
2011-12-17 00:31:16benjamin.petersonsetassignee: benjamin.peterson
messages: + msg149650
2011-12-17 00:29:00terry.reedysetnosy: + terry.reedy
2011-12-16 17:04:04jceasetnosy: + jcea
2011-12-16 06:51:15pitrousetnosy: + ncoghlan, benjamin.peterson

stage: patch review
2011-12-16 03:48:32ron_adamsetmessages: + msg149591
2011-12-16 00:02:53ron_adamcreate