Message 210030 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	wolma
Recipients	gregory.p.smith, larry, ncoghlan, oscarbenjamin, steven.daprano, wolma
Date	2014-02-02.22:04:17
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1391378657.6.0.445623565529.issue20481@psf.upfronthosting.co.za>
In-reply-to

Content
Hi Oscar, well, I haven't used sympy much, and I have no experience with the others, but in light of your comment I quickly checked sympy and gmpy2. You are right about them still not using the numbers ABCs, however, on your advise I also checked how the current statistics module implementation handles their numeric types and the answer is: it doesn't, and this is totally independent of the _coerce_types issue. For sympy: the problem lies with statistics._exact_ratio, which cannot convert sympy numeric types to a numerator/denominator tuple (a prerequisite for _sum) For gmpy2: the problem occurs just one step further down the road. gmpy2.Rationals have numerator and denominator properties, so _exact_ratio knows how to handle them, but the elements of the returned tuple are of type gmpy2.mpz (gmpy2's integer equivalent) and when _sum tries to convert the tuple into a Fraction you get: TypeError: both arguments should be Rational instances which is precisely because the mpz type is not integrated into the numbers tower. This last example is very illustrative I think because it shows that already now the standard library (the fractions module in this case) requires numeric types to comply with the numeric tower, so statistics would not be without precedent, and I think this is totally justified: after all this is the standard library (can't believe I'm saying this since I really got into this sort of by accident) and third party libraries should seek compatibility, but the standard library just needs to be self-consistent. I guess using ABCs over a duck-typing approach when coercing types, in fact, offers a huge advantage for third party libraries since they only need to register their types with the numbers ABC to achieve compatibility, while they need to consider complicated subclassing schemes with the current approach (of course, I am only talking about compatibility with _coerce_types here, which is the focus of this issue. Other parts of statistics may impose further restrictions as we've just seen for _sum). Finally, regarding speed. The fundamental difference between the current implementation and my proposed change is that the current version calls _coerce_types for every number in the input sequence, so performance is critical here, but in my version _coerce_types gets called only once and then operates on a really small set of input types, so it is absolutely not the time-critical step in the overall performance of _sum. For this very reason I made no effort at all to optimize the code, but just tried to keep it as simple and clear as possible. This, in fact, is IMHO the second major benefit of my proposal for _coerce_types (besides making its result order-independent). Read the current code for _coerce_types, then the proposed one. Try to consider all their ramifications and side-effects and decide which one's easier to understand and maintain. Best, Wolfgang

Hi Oscar,
well, I haven't used sympy much, and I have no experience with the others, but in light of your comment I quickly checked sympy and gmpy2.
You are right about them still not using the numbers ABCs, however, on your advise I also checked how the current statistics module implementation handles their numeric types and the answer is: it doesn't, and this is totally independent of the _coerce_types issue.
For sympy: the problem lies with statistics._exact_ratio, which cannot convert sympy numeric types to a numerator/denominator tuple (a prerequisite for _sum)
For gmpy2: the problem occurs just one step further down the road. gmpy2.Rationals have numerator and denominator properties, so _exact_ratio knows how to handle them, but the elements of the returned tuple are of type gmpy2.mpz (gmpy2's integer equivalent) and when _sum tries to convert the tuple into a Fraction you get:

TypeError: both arguments should be Rational instances

which is precisely because the mpz type is not integrated into the numbers tower.

This last example is very illustrative I think because it shows that already now the standard library (the fractions module in this case) requires numeric types to comply with the numeric tower, so statistics would not be without precedent, and I think this is totally justified:
after all this is the standard library (can't believe I'm saying this since I really got into this sort of by accident) and third party libraries should seek compatibility, but the standard library just needs to be self-consistent.

I guess using ABCs over a duck-typing approach when coercing types, in fact, offers a huge advantage for third party libraries since they only need to register their types with the numbers ABC to achieve compatibility, while they need to consider complicated subclassing schemes with the current approach (of course, I am only talking about compatibility with _coerce_types here, which is the focus of this issue. Other parts of statistics may impose further restrictions as we've just seen for _sum).

Finally, regarding speed. The fundamental difference between the current implementation and my proposed change is that the current version calls _coerce_types for every number in the input sequence, so performance is critical here, but in my version _coerce_types gets called only once and then operates on a really small set of input types, so it is absolutely not the time-critical step in the overall performance of _sum.
For this very reason I made no effort at all to optimize the code, but just tried to keep it as simple and clear as possible.

This, in fact, is IMHO the second major benefit of my proposal for _coerce_types (besides making its result order-independent). Read the current code for _coerce_types, then the proposed one. Try to consider all their ramifications and side-effects and decide which one's easier to understand and maintain.

Best,
Wolfgang

History
Date	User	Action	Args
2014-02-02 22:04:17	wolma	set	recipients: + wolma, gregory.p.smith, ncoghlan, larry, steven.daprano, oscarbenjamin
2014-02-02 22:04:17	wolma	set	messageid: <1391378657.6.0.445623565529.issue20481@psf.upfronthosting.co.za>
2014-02-02 22:04:17	wolma	link	issue20481 messages
2014-02-02 22:04:17	wolma	create