Author gvanrossum
Recipients BTaskaya, Mark.Shannon, brandtbucher, eric.snow, gregory.p.smith, gvanrossum, pablogsal, rhettinger, serhiy.storchaka, terry.reedy, tim.peters
Date 2021-04-06.21:41:49
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAP7+vJL0zZvko4tpR3276CPpqW_R9RDtg3zS9ZQRmNu9O+ieLQ@mail.gmail.com>
In-reply-to <1617741819.88.0.274343280112.issue43684@roundup.psfhosted.org>
Content
Yeah, I've received many warnings about benchmark hacking already, and we
won't fall to that.

For static statistics, it's easy enough to either download a thousand of
the most popular projects from GitHub or PyPI and just try to compile all
the .py files found there.

For dynamic statistics, the problem is always to come up with a
representative run. I know how to do that for mypy (just running mypy over
itself should be pretty good), but we can't do that for random projects
from PyPI or GitHub -- not only do I not want to run 3rd party code I
haven't eyeballed reviewed carefully first, but usually there's no way to
know how to even run it -- configuration, environment, input data all need
to be fixed, and dependencies need to be installed.

Running test suites (whether it's the stdlib tests or some other set of
unit tests for some other code base) is likely to give skewed results,
given how people often write tests (trying to hit each line of code at
least once but as few times as possible so the tests run quickly). This is
where benchmarks provide some limited value, since they come with a
standard way to run non-interactively.

I wish it was the case that static stats can be a good proxy for dynamic
stats, since then we could just compute static stats for a large amount of
3rd party code, but I'm not too sure of that. So I would like to have more
examples that we can measure both statically and dynamically -- if the
dynamic numbers look sufficiently similar to the static numbers, we can see
that as an indication that static numbers are a good indicator, and if they
look different, it means that we're going to have to collect more examples
in order to make progress.
History
Date User Action Args
2021-04-06 21:41:49gvanrossumsetrecipients: + gvanrossum, tim.peters, rhettinger, terry.reedy, gregory.p.smith, Mark.Shannon, eric.snow, serhiy.storchaka, pablogsal, brandtbucher, BTaskaya
2021-04-06 21:41:49gvanrossumlinkissue43684 messages
2021-04-06 21:41:49gvanrossumcreate