Unify handling of stats in the CPython VM #90230

markshannon · 2021-12-14T16:17:54Z

BPO	46072
Nosy	@tiran, @markshannon, @brandtbucher
PRs	bpo-46072: Add --with-pystats configure option to simplify gathering of VM stats #30116 bpo-46072: Document --enable-stats option. #30139 bpo-46072: Better randomization of stats filenames. #30145 bpo-46072: Add top level stats struct #30169 bpo-46072: Add simple stats for Python calls. #30989 bpo-46072: Add some object layout stats #31051 bpo-46072: Add some frame stats. #31060 bpo-46072: Add stats for FOR_ITER. #31079 bpo-46072: Improve LOAD_METHOD stats #31104 bpo-46072: Collect stats for UNPACK_SEQUENCE. #31105 bpo-46072: Add per-instruction miss stats. #31108 bpo-46072: Merge dxpairs into py_stats. #31197 bpo-46072: Print summary stats for overall success of specialization. #31211 bpo-46072: Output stats as markdown with collapsible sections. #31228 bpo-46072: Right justify numeric columns in stats summary script. #31234 bpo-46072: Add stats for PRECALL_FUNCTION. #31250 bpo-46072: Fix sys.getdxp(). #31251 bpo-46072: Include length in stats for UNPACK_SEQUENCE. #31254 bpo-46072: Gather stats for PRECALL_METHOD. #31259 bpo-46072: Add detailed failure stats for `BINARY_OP` #31289 bpo-46072: Add pair counts to stats output and summary. #31324

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-12-14.16:17:53.632>
labels = []
title = 'Unify handling of stats in the CPython VM'
updated_at = <Date 2022-02-16.16:50:02.905>
user = 'https://github.com/markshannon'

bugs.python.org fields:

activity = <Date 2022-02-16.16:50:02.905>
actor = 'brandtbucher'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2021-12-14.16:17:53.632>
creator = 'Mark.Shannon'
dependencies = []
files = []
hgrepos = []
issue_num = 46072
keywords = ['patch']
message_count = 14.0
messages = ['408543', '408613', '408618', '408702', '408704', '408708', '408787', '412002', '412271', '412350', '412750', '412769', '412901', '413341']
nosy_count = 3.0
nosy_names = ['christian.heimes', 'Mark.Shannon', 'brandtbucher']
pr_nums = ['30116', '30139', '30145', '30169', '30989', '31051', '31060', '31079', '31104', '31105', '31108', '31197', '31211', '31228', '31234', '31250', '31251', '31254', '31259', '31289', '31324']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue46072'
versions = []

The text was updated successfully, but these errors were encountered:

markshannon · 2021-12-14T16:17:54Z

By "stats" I mean the internal numbers gathered by the VM for performance and debugging. This has nothing to do with any statistics module.

Currently various parts of the VM gather stats: the GC, dicts, the bytecode interpreter, type lookup cache, etc.

These stats have various compile time flags to turn them on and off. They have differing ways to display the stats, and no unified way to gather stats across different runs.

For the specialization stats we dump stats, which we can parse to collate stats across runs.

We should:

Add a --with-pystats config flag (like with-pydebug) to turn on stat gathering at build time.
Convert the other stats to the format used by the specialization stats, so all stats can be parsed and collated.

markshannon · 2021-12-15T15:32:41Z

New changeset 342b93f by Mark Shannon in branch 'main':
bpo-46072: Add --with-pystats configure option to simplify gathering of VM stats (GH-30116)
342b93f

tiran · 2021-12-15T17:14:09Z

Could you please add the new option to Doc/using/configure.rst ?

markshannon · 2021-12-16T13:41:07Z

New changeset 4506bbe by Mark Shannon in branch 'main':
bpo-46072: Document --enable-stats option. (GH-30139)
4506bbe

tiran · 2021-12-16T14:25:02Z

I just noticed that you are using hard-coded paths with /tmp for the pystats directory. That's problematic and opens the possibility of a symlink race attack.

Could please add exclusive create to _Py_PrintSpecializationStats()? The will prevent symlink attacks. fopen() mode "x" is not generally available in all libcs. You have to combine open() and fdopen():

int flags = O_WRONLY | O_CREAT | O_EXCL;
#ifdef O_NOFOLLOW
flags |= O_NOFOLLOW;
#endif
#ifdef O_CLOEXEC
flags |= O_CLOEXEC;
#endif

int fd = open(path, flags);
if (fd >= 0) {
    FILE *fout = fdopen(fd, "w");
}

markshannon · 2021-12-16T14:47:14Z

The --enable-stats option is for CPython development and shouldn't be turned on for a release version, so I'm not really concerned about people attacking their own machines.

markshannon · 2021-12-17T14:48:11Z

New changeset efd6236 by Mark Shannon in branch 'main':
bpo-46072: Add top level stats struct (GH-30169)
efd6236

markshannon · 2022-01-28T15:20:56Z

New changeset 90ab138 by Mark Shannon in branch 'main':
bpo-46072: Add simple stats for Python calls. (GH-30989)
90ab138

markshannon · 2022-02-01T15:05:39Z

New changeset 48be46e by Mark Shannon in branch 'main':
bpo-46072: Add some object layout and allocation stats (GH-31051)
48be46e

markshannon · 2022-02-02T11:02:01Z

New changeset 187930f by Mark Shannon in branch 'main':
bpo-46072: Add some frame stats. (GH-31060)
187930f

markshannon · 2022-02-07T14:31:01Z

New changeset 062460e by Mark Shannon in branch 'main':
bpo-46072: Improve LOAD_METHOD stats (GH-31104)
062460e

markshannon · 2022-02-07T16:51:53Z

New changeset 9c979d7 by Mark Shannon in branch 'main':
bpo-46072: Merge dxpairs into py_stats. (GH-31197)
9c979d7

markshannon · 2022-02-09T12:30:39Z

New changeset f71a69a by Mark Shannon in branch 'main':
bpo-46072: Output stats as markdown with collapsible sections. (GH-31228)
f71a69a

brandtbucher · 2022-02-16T16:50:03Z

New changeset 580cd9a by Brandt Bucher in branch 'main':
bpo-46072: Add detailed failure stats for BINARY_OP (GH-31289)
580cd9a

penguin-wwy · 2022-04-29T03:24:35Z

Python API should be provided? This allows us to have more accurate statistical information about the status.

For example, in pyperformance bm_hexiom:

import _opcode as stats
def main(loops, level):
    ...
    ...

    stats.init_specialization_stats() # Clear unrelated status information
    stats.enable_specialization_stats()
    for _ in range_it: # core loop of benchmark
        stream = io.StringIO()
        solve_file(board, strategy, order, stream)
        output = stream.getvalue()
        stream = None
    stats.disable_specialization_stats()
    print(stats.get_specialization_stats())# Print status information
    ...
    ...
    return dt

Then get stats dict:

{'load_attr': {'success': 55, 'failure': 4, 'hit': 72355, 'deferred': 168, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'load_global': {'success': 81, 'failure': 0, 'hit': 47503, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'load_method': {'success': 36, 'failure': 0, 'hit': 36698, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'binary_subscr': {'success': 47, 'failure': 48, 'hit': 78278, 'deferred': 2488, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 40, 0, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'store_subscr': {'success': 9, 'failure': 0, 'hit': 3610, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'store_attr': {'success': 9, 'failure': 0, 'hit': 671, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'call': {'success': 54, 'failure': 9, 'hit': 36321, 'deferred': 1079, 'miss': 783, 'deopt': 14, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'binary_op': {'success': 43, 'failure': 4, 'hit': 11646, 'deferred': 16, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'compare_op': {'success': 37, 'failure': 480, 'hit': 24938, 'deferred': 30707, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 480, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'unpack_sequence': {'success': 2, 'failure': 0, 'hit': 122, 'deferred': 0, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)}, 'precall': {'success': 82, 'failure': 9, 'hit': 79148, 'deferred': 234, 'miss': 0, 'deopt': 0, 'failure_kinds': (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0)}}

This helps us to more precisely analyze the impact of optimization on benchmark.

…rame` (GH-93284)

ezio-melotti transferred this issue from another repository Apr 10, 2022

penguin-wwy mentioned this issue May 5, 2022

gh-90230: Python API for specialization stats #92323

Closed

markshannon added a commit that referenced this issue May 27, 2022

GH-90230: Add stats to breakdown the origin of calls to `PyEval_EvalF…

bbcf424

…rame` (GH-93284)

bedevere-bot mentioned this issue Sep 6, 2022

GH-90230: Fix warnings and failures with --enable-pystats #96622

Merged

brandtbucher added a commit that referenced this issue Sep 9, 2022

GH-90230: Fix warnings and failures with --enable-pystats (GH-96622)

b4954b1

iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify handling of stats in the CPython VM #90230

Unify handling of stats in the CPython VM #90230

markshannon commented Dec 14, 2021

markshannon commented Dec 14, 2021

markshannon commented Dec 15, 2021

tiran commented Dec 15, 2021

markshannon commented Dec 16, 2021

tiran commented Dec 16, 2021

markshannon commented Dec 16, 2021

markshannon commented Dec 17, 2021

markshannon commented Jan 28, 2022

markshannon commented Feb 1, 2022

markshannon commented Feb 2, 2022

markshannon commented Feb 7, 2022

markshannon commented Feb 7, 2022

markshannon commented Feb 9, 2022

brandtbucher commented Feb 16, 2022

penguin-wwy commented Apr 29, 2022 •

edited

Unify handling of stats in the CPython VM #90230

Unify handling of stats in the CPython VM #90230

Comments

markshannon commented Dec 14, 2021

markshannon commented Dec 14, 2021

markshannon commented Dec 15, 2021

tiran commented Dec 15, 2021

markshannon commented Dec 16, 2021

tiran commented Dec 16, 2021

markshannon commented Dec 16, 2021

markshannon commented Dec 17, 2021

markshannon commented Jan 28, 2022

markshannon commented Feb 1, 2022

markshannon commented Feb 2, 2022

markshannon commented Feb 7, 2022

markshannon commented Feb 7, 2022

markshannon commented Feb 9, 2022

brandtbucher commented Feb 16, 2022

penguin-wwy commented Apr 29, 2022 • edited

penguin-wwy commented Apr 29, 2022 •

edited