Message 393616 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gvanrossum
Recipients	Zac Hatfield-Dodds, gvanrossum, p-ganssle, terry.reedy
Date	2021-05-13.22:15:44
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1620944144.87.0.589479183693.issue42109@roundup.psfhosted.org>
In-reply-to

Content
I would like to have a more thorough discussion about the desirability of using Hypothesis first, since I feel that there is a rather hard "sell" going on. I brought this up in the SC tracker (https://github.com/python/steering-council/issues/65) but I don't want to have the discussion there. Copying some quoted text from there: [me] > > Can we perhaps have an open discussion on the merits of Hypothesis itself on python-dev before committing to this? [Paul] > You mean hypothesis specifically or property-based testing in general? I think that if we add property-based testing, hypothesis is probably the only natural choice. Of course, I can make this case on the thread if that was your intention. I don't even know what property-based testing means (for Python) other than Hypothesis. What other frameworks support that? In any case, nobody has been promoting anything else, but Zac has been on our case for over a year. :-) [me] > > It seems to promote a coding style that's more functional than is good for Python, and its inner workings seem too magical to me. Also the decorator-based DSL looks pretty inscrutable to me. [Paul] > Do you mean it's promoting a coding style within the tests, or in the public API? When designing zoneinfo I didn't think about how easy it would be to test with Hypothesis at all as part of the API and it was basically frictionless. I meant in the public API. I don't doubt that for zoneinfo this worked well, but that doesn't prove it's any better than any other framework (maybe you would have been ecstatic if you could have used pytest as well :-). Fundamentally, I am a bit suspicious of Hypothesis' origins, Haskell, which is a language and community that just have a different approach to programming than Python. Don't get me wrong, I don't think there's anything wrong with Haskell (except that I've noticed it's particularly popular with the IQ >= 150 crowd :-). It's just different than Python, and just as we don't want to encourage writing "Fortran or Java in Python", I don't think it's a good idea to recommend writing "Haskell in Python". > I think there are situations where it makes sense to write functional-style strategies because it can in some situations give hypothesis more information about how the strategies are transformed (and thus allow it to optimize how the data is drawn rather than drawing from the full set and discarding a bunch of stuff that doesn't match, or doing a bunch of extra logic on each draw), but it's by no means required. I'd say most of my "complex strategies" are decorated functions rather than chains of maps and filters. Ah, here we get to the crux of the matter. What's a "strategy"? Maybe I would be more inclined to support Hypothesis if it was clearer what it does. I like libraries that do stuff for me that I know how to do myself. With Hypothesis, the deeper I get into the docs, the more I get the impression that there is deep magic going on that a poor sap like myself isn't supposed to understand. When I write @given(st.lists(st.integers)) what does that do? I haven't the foggiest idea, and from the docs I get the impression I'm not supposed to worry about it, and that worries me. Clearly it can't be generating all lists of 0 integers followed by all lists of 1 integer followed by all lists of 2 integers ... So in what order does it generate test data? The docs are pretty vague about that. (I'm not playing dumb here. I really have no idea how to go about that, and it seems that this is at the heart of Hypothesis and its' Haskell ancestors. Presumably Ph.D theses went into making this sort of thing work. Perhaps more effort could be expended explaining this part to laypeople?) > From the stdlib-property-tests repo, this seems like the most complicated strategy I'm seeing employed, and I find it fairly simple to understand, and I don't know how easy to read any code would be that generates trees of arbitrary depth containing objects with specific properties. I have never tried that sort of thing, so I'll take your word for it. But there are other strategies. It seems that a lot of the strategies are answers to a series of questions, along the lines of "how do I generate valid URLs"; and then "email addresses"; and then "IP addresses"; and then suddenly we find a strategy that generates functions, and then we get "recursive things", and it keeps going. Timezone keys?! No wonder you found it easy to use. :-) I think I've heard that there's a strategy somewhere that generates random Python programs. How would that even work? Would it help me find corner cases in the grammar? I suppose we could actually use something like that for the "faster CPython" project, to validate that the optimizer doesn't break things. But how? Do I just write @given(st.python_programs(), st.values()) def test_optimizer(func, arg): expected = func(arg) optimized = optimizer(func)(arg) assert optimized == expected ??? Methinks that the hard work then is writing the python_programs() strategy, so I'd want to understand that before trusting it. I suspect I've repeated myself enough times at this point :-), so I hope I'll get some sort of answer. Somehow it seems in its zeal to "sell" itself as super-easy to use in all sorts of circumstances, Hypotheses has buried the lede -- how do you write effective strategies?

I would like to have a more thorough discussion about the desirability of using Hypothesis first, since I feel that there is a rather hard "sell" going on. I brought this up in the SC tracker (https://github.com/python/steering-council/issues/65) but I don't want to have the discussion there.

Copying some quoted text from there:

[me]
> > Can we perhaps have an open discussion on the merits of Hypothesis itself on python-dev before committing to this?

[Paul]
> You mean hypothesis specifically or property-based testing in general? I think that if we add property-based testing, hypothesis is probably the only natural choice. Of course, I can make this case on the thread if that was your intention.

I don't even know what property-based testing means (for Python) other than Hypothesis. What other frameworks support that? In any case, nobody has been promoting anything else, but Zac has been on our case for over a year. :-)

[me]
> > It seems to promote a coding style that's more functional than is good for Python, and its inner workings seem too magical to me. Also the decorator-based DSL looks pretty inscrutable to me.

[Paul]
> Do you mean it's promoting a coding style within the tests, or in the public API? When designing zoneinfo I didn't think about how easy it would be to test with Hypothesis at all as part of the API and it was basically frictionless.

I meant in the public API. I don't doubt that for zoneinfo this worked well, but that doesn't prove it's any better than any other framework (maybe you would have been ecstatic if you could have used pytest as well :-). Fundamentally, I am a bit suspicious of Hypothesis' origins, Haskell, which is a language and community that just have a different approach to programming than Python. Don't get me wrong, I don't think there's anything wrong with Haskell (except that I've noticed it's particularly popular with the IQ >= 150 crowd :-). It's just different than Python, and just as we don't want to encourage writing "Fortran or Java in Python", I don't think it's a good idea to recommend writing "Haskell in Python".

> I think there are situations where it makes sense to write functional-style strategies because it can in some situations give hypothesis more information about how the strategies are transformed (and thus allow it to optimize how the data is drawn rather than drawing from the full set and discarding a bunch of stuff that doesn't match, or doing a bunch of extra logic on each draw), but it's by no means required. I'd say most of my "complex strategies" are decorated functions rather than chains of maps and filters.

Ah, here we get to the crux of the matter. What's a "strategy"? Maybe I would be more inclined to support Hypothesis if it was clearer what it does. I like libraries that do stuff for me that I know how to do myself. With Hypothesis, the deeper I get into the docs, the more I get the impression that there is deep magic going on that a poor sap like myself isn't supposed to understand. When I write @given(st.lists(st.integers)) what does that do? I haven't the foggiest idea, and from the docs I get the impression I'm not supposed to worry about it, and *that* worries me. Clearly it can't be generating all lists of 0 integers followed by all lists of 1 integer followed by all lists of 2 integers ... So in what order does it generate test data? The docs are pretty vague about that.

(I'm not playing dumb here. I really have no idea how to go about that, and it seems that this is at the heart of Hypothesis and its' Haskell ancestors. Presumably Ph.D theses went into making this sort of thing work. Perhaps more effort could be expended explaining this part to laypeople?)

> From the stdlib-property-tests repo, this seems like the most complicated strategy I'm seeing employed, and I find it fairly simple to understand, and I don't know how easy to read any code would be that generates trees of arbitrary depth containing objects with specific properties.

I have never tried that sort of thing, so I'll take your word for it.

But there are other strategies. It seems that a lot of the strategies are answers to a series of questions, along the lines of "how do I generate valid URLs"; and then "email addresses"; and then "IP addresses"; and then suddenly we find a strategy that generates *functions*, and then we get "recursive things", and it keeps going. Timezone keys?! No wonder you found it easy to use. :-)

I think I've heard that there's a strategy somewhere that generates random Python programs. How would that even work? Would it help me find corner cases in the grammar? I suppose we could actually use something like that for the "faster CPython" project, to validate that the optimizer doesn't break things. But how? Do I just write

@given(st.python_programs(), st.values())
def test_optimizer(func, arg):
    expected = func(arg)
    optimized = optimizer(func)(arg)
    assert optimized == expected

???

Methinks that the hard work then is writing the python_programs() strategy, so I'd want to understand that before trusting it.

I suspect I've repeated myself enough times at this point :-), so I hope I'll get some sort of answer. Somehow it seems in its zeal to "sell" itself as super-easy to use in all sorts of circumstances, Hypotheses has buried the lede -- how do you write effective strategies?

History
Date	User	Action	Args
2021-05-13 22:15:44	gvanrossum	set	recipients: + gvanrossum, terry.reedy, p-ganssle, Zac Hatfield-Dodds
2021-05-13 22:15:44	gvanrossum	set	messageid: <1620944144.87.0.589479183693.issue42109@roundup.psfhosted.org>
2021-05-13 22:15:44	gvanrossum	link	issue42109 messages
2021-05-13 22:15:44	gvanrossum	create