This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Mark.Shannon
Recipients Mark.Shannon, pablogsal
Date 2021-08-24.10:18:35
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
The two plausible layouts from evaluation stack frames are described here:

We opted for layout A, although it is a bit more complex to manage and slightly more expensive in terms of pointers. The reason for this was that it theoretically allows zero-copying Python-to-Python calls.

I now believe this was the wrong decision and we should have chosen layout B.

B is cheaper. It needs 2 pointers, not 3, meaning that there is another register available for use in the interpreter.
Also the linkage area doesn't need the nlocalsplus field.

The benefit of zero-copy calls is much smaller than I thought:
* Any calls from a generator functions do not benefit
* An additional check is needed to make sure that both frames are in the same stack chunk
* Any jitted code will keep stack values in registers, so stores will still be needed in either case.
* The average number of arguments copied is low (typically 2 or 3).

Even in the ideal case (interpreter, no generator, same stack chunk) changing to layout B
will cost 2/3 memory moves (independent of each other), but will gain us extra code for checking chunks, and one move (moving nlocalsplus). So at best we only save 1/2 moves.

In other cases layout B is better.

One final improvement to layout B: saving the stackdepth as an offset from locals[0] not from stack[0] further speeds frame handling.
Date User Action Args
2021-08-24 10:18:35Mark.Shannonsetrecipients: + Mark.Shannon, pablogsal
2021-08-24 10:18:35Mark.Shannonsetmessageid: <>
2021-08-24 10:18:35Mark.Shannonlinkissue44990 messages
2021-08-24 10:18:35Mark.Shannoncreate