Message 390378 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gvanrossum
Recipients	BTaskaya, Mark.Shannon, brandtbucher, eric.snow, gregory.p.smith, gvanrossum, pablogsal, rhettinger, serhiy.storchaka, terry.reedy, tim.peters
Date	2021-04-06.21:41:49
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CAP7+vJL0zZvko4tpR3276CPpqW_R9RDtg3zS9ZQRmNu9O+ieLQ@mail.gmail.com>
In-reply-to	<1617741819.88.0.274343280112.issue43684@roundup.psfhosted.org>

Content
Yeah, I've received many warnings about benchmark hacking already, and we won't fall to that. For static statistics, it's easy enough to either download a thousand of the most popular projects from GitHub or PyPI and just try to compile all the .py files found there. For dynamic statistics, the problem is always to come up with a representative run. I know how to do that for mypy (just running mypy over itself should be pretty good), but we can't do that for random projects from PyPI or GitHub -- not only do I not want to run 3rd party code I haven't eyeballed reviewed carefully first, but usually there's no way to know how to even run it -- configuration, environment, input data all need to be fixed, and dependencies need to be installed. Running test suites (whether it's the stdlib tests or some other set of unit tests for some other code base) is likely to give skewed results, given how people often write tests (trying to hit each line of code at least once but as few times as possible so the tests run quickly). This is where benchmarks provide some limited value, since they come with a standard way to run non-interactively. I wish it was the case that static stats can be a good proxy for dynamic stats, since then we could just compute static stats for a large amount of 3rd party code, but I'm not too sure of that. So I would like to have more examples that we can measure both statically and dynamically -- if the dynamic numbers look sufficiently similar to the static numbers, we can see that as an indication that static numbers are a good indicator, and if they look different, it means that we're going to have to collect more examples in order to make progress.

Yeah, I've received many warnings about benchmark hacking already, and we
won't fall to that.

For static statistics, it's easy enough to either download a thousand of
the most popular projects from GitHub or PyPI and just try to compile all
the .py files found there.

For dynamic statistics, the problem is always to come up with a
representative run. I know how to do that for mypy (just running mypy over
itself should be pretty good), but we can't do that for random projects
from PyPI or GitHub -- not only do I not want to run 3rd party code I
haven't eyeballed reviewed carefully first, but usually there's no way to
know how to even run it -- configuration, environment, input data all need
to be fixed, and dependencies need to be installed.

Running test suites (whether it's the stdlib tests or some other set of
unit tests for some other code base) is likely to give skewed results,
given how people often write tests (trying to hit each line of code at
least once but as few times as possible so the tests run quickly). This is
where benchmarks provide some limited value, since they come with a
standard way to run non-interactively.

I wish it was the case that static stats can be a good proxy for dynamic
stats, since then we could just compute static stats for a large amount of
3rd party code, but I'm not too sure of that. So I would like to have more
examples that we can measure both statically and dynamically -- if the
dynamic numbers look sufficiently similar to the static numbers, we can see
that as an indication that static numbers are a good indicator, and if they
look different, it means that we're going to have to collect more examples
in order to make progress.

History
Date	User	Action	Args
2021-04-06 21:41:49	gvanrossum	set	recipients: + gvanrossum, tim.peters, rhettinger, terry.reedy, gregory.p.smith, Mark.Shannon, eric.snow, serhiy.storchaka, pablogsal, brandtbucher, BTaskaya
2021-04-06 21:41:49	gvanrossum	link	issue43684 messages
2021-04-06 21:41:49	gvanrossum	create