New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More opcode predictions #71442
Comments
Currently the PREDICT() macros is used for predicticting following pair of opcodes in the ceval loop: LIST_APPEND JUMP_ABSOLUTE I collected static and dynamic opcode statistics. Static statistics is a statistics about what opcodes are used in code objects after running the compileall script on the Lib directory. Dynamic statistics is is a statistics about what opcodes are executed when run all Python tests. Resulting numbers are close, but there are some reasonable differences (opcodes used in loops have larger numbers in dynamic statistics, and opcodes used for creating classes and functions have larger numbers in static statistics). In following lines the fist number is from dynamic statistics and the second number is from static statistics. LIST_APPEND, SET_ADD and MAP_ADD are always followed by JUMP_ABSOLUTE. According to this statistic existing predictions are correct. But the statistics suggests other predictions. JUMP_ABSOLUTE is followed by FOR_ITER in 77% (0%). Proposed patch adds predictions of following pairs: DUP_TOP_TWO BINARY_SUBSCR Note that the effect of this change is not very significant since the PREDICT() macros is no-op if computed GOTOs are used. |
Serhiy, please slow down and stop rewriting every single thing you see. Your rate of patches is prolific and hard to digest. Please give some consideration that the people who came before you (lots of them) put a lot of thought into what was done and were deliberately conservative. Dynamic statistics can vary widely depending on code being profiled (much like PGO optimizations for C compilers). For a prediction to have value, it needs to be a commonly used opcode; it must occur in a highly predictive pair with an intrinsic relationship rather than a coincidental relationship. FWIW, these tests aren't free. Global prediction tables have a limited size and are subject to aliasing. Mispredictions are expensive. Also, the ceval loop doesn't need more clutter. The new opcodes GET_YIELD_FROM_ITER, GET_AITER, GET_ANEXT, and GET_AWAITABLE haven't been considered before. The code in Python/compile.c shows that LOAD_CONST is the only possible next opcode, so these are reasonable candidates (i.e. the pair effectively acts as a single opcode). Of these, I think only GET_ANEXT and GET_AWAITABLE are likely to occur in a loop. So, these two may be worthwhile. All the rest should probably be skipped. |
Since existing predictions were added many new opcodes and opcode combinations were added. Predictions need to be updated to reflect current status. The patch contained only predictions for highly predictive (>=94%) pairs of commonly used opcodes or 100% predictive pairs of uncommonly used opcodes (but they can be more common in specific applications). The second version of the patch contains only 100% predictions. It doesn't depend on any statistics, the compiler always generate these opcodes in pairs. |
The second version of the patch mostly looks fine. The prediction from DUP_TOP_TWO to BINARY_SUBSCR is questionable. Because compile.c only uses it one context, the prediction rate is 100%; however, there is no intrinsic relationship between the two opcodes, so the prediction is happenstance and fragile. Reconsider whether you really want this prediction pairing. |
New changeset f19c2b28710e by Serhiy Storchaka in branch 'default': |
Thank you Raymond. Committed without the prediction from DUP_TOP_TWO to BINARY_SUBSCR. What are you think about following highly predictive pairs?
FYI here is a list of most common pairs (predicted pairs are starred).
|
The BUILD_SLICE pairing with BINARY_SUBSCR has a tight conceptual linkage with the lookup case dominating use patterns over slice deletion and slice assignment. IIRC, I passed on that pairing long ago because even though the predictive power was good, it didn't come up much (a very, very low percentage of executed op-codes in any real programs). That would mean the user wouldn't be likely to see any benefit and I worried about the cost (global prediction tables have a finite size and are subject to aliased false positives, so when in doubt, it is better to not do it at all.) The UNPACK_SEQUENCE pairing with STORE_FAST had the opposite issue. The two arise in real code often enough to be interesting, but I wasn't as confident that it wasn't competing with alternative successors like STORE_NAME or attribute storage. Here, the statistics are heavily influenced by whatever is being profiled. Given that I wasn't confident in the pairing, I passed on it. That said, there's nothing terribly wrong with either, so it is hard to veto them. Now as well as back when the opcode predictions were first studied, I remain -0 on both pairings. In general, we should have a bias towards there being relatively few predictions (each one adds size to the generated code, adds load to the branch prediction tables, and adds to the cognitive load of anyone modifying the compiler, the peephole optimizer, or adding new opcodes) and there should be a preference to leave out cases where there is doubt. I'll leave it to you to decide whether either one of these should go in (I just wanted you to know why they weren't included in the first place). |
OK, I'm only +0 on adding these predictions and fine with status quo. The ratio 28:1 looks compelling to me, but all this is not enough important. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: