Move Demo/classes/Rat.py to Lib/fractions.py and fix it up. #46023

jyasskin · 2007-12-21T17:41:30Z

BPO	1682
Nosy	@gvanrossum, @rhettinger, @facundobatista, @mdickinson, @ncoghlan, @wm75
Files	rational.patch rational.patch: V2, ported to 2.6 rational.patch: V3, still minimal rational_tweaks.patch: Some tweaks on top of the original implementation lehmer_gcd.py: Revised Lehmer gcd algorithm lehmer_gcd_2.py: Benchmark gcd implementations lehmer_gcd.patch fraction_inline_arith.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2008-07-01.20:35:22.398>
created_at = <Date 2007-12-21.17:41:30.425>
labels = ['type-bug', 'library']
title = 'Move Demo/classes/Rat.py to Lib/fractions.py and fix it up.'
updated_at = <Date 2014-09-25.11:53:54.268>
user = 'https://bugs.python.org/jyasskin'

bugs.python.org fields:

activity = <Date 2014-09-25.11:53:54.268>
actor = 'wolma'
assignee = 'jyasskin'
closed = True
closed_date = <Date 2008-07-01.20:35:22.398>
closer = 'mark.dickinson'
components = ['Library (Lib)']
creation = <Date 2007-12-21.17:41:30.425>
creator = 'jyasskin'
dependencies = []
files = ['9021', '9114', '9152', '9201', '9464', '9485', '9486', '9570']
hgrepos = []
issue_num = 1682
keywords = ['patch']
message_count = 92.0
messages = ['58949', '59424', '59425', '59582', '59583', '59608', '59614', '59615', '59633', '59635', '59636', '59639', '59640', '59645', '59650', '59651', '59656', '59657', '59659', '59665', '59747', '59755', '59772', '59773', '59870', '59884', '59886', '59958', '59982', '60080', '60081', '60131', '60132', '60133', '60137', '61737', '61995', '62001', '62005', '62010', '62011', '62013', '62028', '62029', '62099', '62189', '62192', '62218', '62222', '62227', '62236', '62249', '62251', '62252', '62253', '62254', '62256', '62257', '62258', '62271', '62305', '62311', '62318', '62320', '62324', '62335', '62337', '62340', '62380', '62401', '62403', '62406', '62547', '62550', '62564', '62566', '62567', '62568', '62671', '62672', '63087', '63126', '68725', '68732', '68757', '68822', '68823', '68843', '68851', '69020', '69027', '69069']
nosy_count = 7.0
nosy_names = ['gvanrossum', 'rhettinger', 'facundobatista', 'mark.dickinson', 'ncoghlan', 'jyasskin', 'wolma']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue1682'
versions = ['Python 2.6', 'Python 3.0']

jyasskin · 2007-12-21T17:41:30Z

This is written against the 3.0 branch, but before submitting it, I need
to backport numbers.py and submit it against 2.6 instead. I'm posting it
now to get a head start on comments.

The numbers.py part is necessary for this to work but will be a separate
patch.

mdickinson · 2008-01-07T01:31:48Z

Some random comments: take with a large pinch of salt

(1) In __init__ I think you want something like:

  self._numerator = int(numerator)//g

instead of

  self._numerator = int(numerator / g)

and similarly for the denominator. At the moment I get, for example:

>>> Rational(10**23)
rational.Rational(99999999999999991611392,1)

(2) What's the reason for repr of a Rational being "rational.Rational(...)", while repr
of a Decimal is just "Decimal(...)"? I'm not suggesting that either is wrong; just
wondering whether there's some sort of guiding principle that would suggest one or the
other.

(3) My first thought on looking at the _gcd function was: "Shouldn't there be an abs()
somewhere in there"; since the gcd of two (possibly negative) integers is often
(usually?) defined as the largest *nonnegative* common divisor. But on closer
examination it's clear that the result of _gcd(a, b) has the same sign as that of b
(unless b == 0). So _gcd very cleverly chooses exactly the right sign so that the
denominator after rescaling is positive. I'm not sure whether this is a happy accident
or clever design, but either way it probably deserves a comment.

(4) the __ceil__ operation could be spelt more neatly as

return -(-a.numerator // a.denominator)

...but perhaps that just amounts to obfuscation :)

(5) It looks as though two-argument round just truncates when the second argument is
negative. Presmably this isn't what's wanted here?

>>> round(Rational(26), -1)  # expecting Rational(30, 1)
rational.Rational(20,1)

(6) The behaviour of the equality test is questionable when floats are involved. For
example:

>>> 10**23 == float(10**23)  # expect False
False
>>> Rational(10**23) == float(10**23)  # I'd also expect False here
True

Ideally, if x is a Rational and y is a float then I'd suggest that x == y should return
True only when the numeric values really are equal. Coding this could be quite tricky,
though. One option would be to convert the float to an (exactly equal) Rational first--
-there's code to do this in the version of Rational.py in the sandbox.

(7) For purely selfish reasons, I for one will be very happy if/when this makes it into
2.6/3.0: I use Python a lot for teaching (geometry, number theory, linear algebra,
...); it's natural to work with rational numbers in this context; and it'll be much
easier to tell an interested student to just download Python than to tell them they also
need gmp and gmpy, or some other third party package, just to try out the code examples
I've given them.

mdickinson · 2008-01-07T02:21:45Z

Two more questions:

(1) should a Rational instance be hashable?
(2) Should "Rational(5,2) == Decimal('2.5')" evaluate to True or False?

If Rationals are hashable and 2.5 == Rational(5, 2) == Decimal('2.5') then the
rule that objects that compare equal should have equal hash is going to make
life *very* interesting...

jyasskin · 2008-01-09T07:36:22Z

Thanks again for the excellent comments.

__init__: good catch.

repr(Rational): The rule for repr is "eval(repr(object)) == object".
Unfortunately, that doesn't decide between the two formats, since both
assume some particular import statements. I picked the one more likely
to be unique, and I assume Decimal picked the shorter one. I can go
either way.

_gcd's sign: It's a happy accident for me. Possibly Sjoerd Mullender
designed it that way. I've added a comment and a test.

__ceil__: I like that implementation better.

2-argument round: Fixed and tested.

equality: Very good point. I've stolen the sandbox code and added
Rational.from_float() using it. I think I also need to make this change
to the comparisons.

hashing: oops, yes these should be hashable. Decimal cheats by comparing
!= to even floats that it's equal to, so I'm going to assume that they
also want Rational(5,2) != Decimal('2.5').

The new patch is against 2.6.

rhettinger · 2008-01-09T09:04:42Z

FWIW, I'm -1 on moving this module to the standard library. There has
been basically *zero* demand for something like this. Because rational
implementations seem to be more fun to write than to use, I think there
are some competing implementations.

gvanrossum · 2008-01-09T16:09:20Z

Raymond, do you *always* have to be such a grinch?

The mere existance of competing implementations proves that there's a
need. Hopefully having one in the stdlib will inspire others to
contribute improvements rather than starting from scratch.

rhettinger · 2008-01-09T17:48:32Z

Not sure I'm the grinch on this one. I thought PEPs on this were
rejected long ago and no one complained afterwards. After years of
watching newsgroup discussions and feature requests, I do not recall a
single request or seeing a single problem that was better solved with a
rational package. On python-help and the tutorial newsgroup, I've
never seen a question about the existing module in the Tools directory
or even a question on the topic.

I think rational modules are ubiquitous for the same reason as Sudoku
solvers -- they are cute, an easy exercise, and fun to write. There is
some variation in how each chooses to make compromises so the
denominators do not get unusably large. Also, there is some variation
in sophistication of the GCD/LCD algorithm.

I had thought the standards for inclusion in the standard library had
risen quite a bit (best-in-class category killers and whatnot). ISTM,
this is aspiring cruft -- I cannot remember encountering a real
business or engineering problem that needed rational arithmetic (the
decimal module seems to meet the rare need for high precision
arithmetic).

All that being said, maybe the module with serve some sort of
educational purpose or will serve to show-off the numeric tower
abstract base classes.

facundobatista · 2008-01-09T17:55:33Z

The PEP-239 is Rejected (http://www.python.org/dev/peps/pep-0239/).

If a Rational data type would be included in the stdlib, my
recommendation is that first that PEP would be reopened and pushed until
get accepted.

Also, note the kind of questions Mark is asking here: the analysis and
answer of those questions belongs to a PEP, not to a patch. We risk to
not be very rational (1/2 wink) in these decisions.

I'd close this patch as Rejected (as for the PEP), at least until the
PEP gets accepted (if ever).

gvanrossum · 2008-01-09T23:11:00Z

The rejection of PEP-239 was years ago. PEP-3141 was accepted; it
includes adding a rational type to the stdlib, and Jeffrey is doing this
with my encouragement.

The motivation is to have at least one example of each type in our
modest numeric tower in the stdlib. I'm sure there will be plenty of
uses for rational.py, e.g. in education. Of course, if someone has a
better candidate or a proposed improvement, speak up now! It looks like
Mark's suggestions can be treated as code review; I don't think we need
a new PEP.

Note that the mere existence of PEP-239 again points out that the demand
is not zero.

rhettinger · 2008-01-10T00:04:53Z

If this goes in, I have some recommendations:

Follow the decimal module's lead in assiduously avoiding
float->rational conversions. Something like Rat.from_float(1.1) is
likely to produce unexpected results (like locking in an inexact input
and having an unexpectedly large denominator).
Likewise, follow decimal's lead in avoiding all automatic coercisions
from floats: Rational(3,10) + 0.3 for example. The two don't mix.
Consider following decimal's lead on having a context that can limit
the maximum size of a denominator. There are various strategies for
handling a limit overflow including raising an exception or finding the
nearest rational upto the max denominator (there are some excellent
though complex published algorithms for this) or rounding the nearest
fixed-base (very easy). I'll dig out my HP calculator manuals at some
point -- they had worked-out a number of challenges with fractional
arithmetic (including their own version of an Inexact tag).
Consider adding Decimal.from_rational and Rational.from_decimal. I
believe these are both easy and can be done losslessly.
Automatic coercions to/from Decimal need to respect the active decimal
context. For example the result of Rational(3,11) +
Decimal('3.1415926') is context dependent and may not be commutative.
When in doubt, keep the API minimal so we don't lock-in design mistakes.
Test the API by taking a few numerical algorithms and seeing how well
they work with rational inputs (for starters, try
http://docs.python.org/lib/decimal-recipes.html ).
If you do put in a method that accepts floats, make sure that it can
accept arguments to control the rational approximation. Ideally, you
would get something something like this Rational.approximate(math.pi, 6)
--> 355/113 that could produce the smallest rationalal approximation to
a given level of accuracy.

facundobatista · 2008-01-10T00:15:35Z

2008/1/9, Raymond Hettinger said:

Consider adding Decimal.from_rational and Rational.from_decimal. I
believe these are both easy and can be done losslessly.

If it's lossless, why not just allow Decimal(Rational(...)) and
Rational(Decimal(...))?

rhettinger · 2008-01-10T00:29:16Z

If it's lossless, why not just allow
Decimal(Rational(...)) and Rational(Decimal(...))?

Those conversions are fine.

It's the float<-->rational conversions that are problematic. There are
exact float to rational conversions using math.frexp() but I don't think
the results tend to match what users expect (since 1.1 isn't exactly
represented). Also, there may be overflow issues with trying to go from
rationals to floats when the denomintor is very large.

I think the history of rational modules is that simplistic
implementations get built and then the writers find that exploding
denominators limit their usefulness for anything other than trivial
problems. The solution is to limit denominators but that involves less
trivial algorithms and a more sophisticated API.

gvanrossum · 2008-01-10T00:38:34Z

On Jan 9, 2008 4:29 PM, Raymond Hettinger <report@bugs.python.org> wrote:

I think the history of rational modules is that simplistic
implementations get built and then the writers find that exploding
denominators limit their usefulness for anything other than trivial
problems. The solution is to limit denominators but that involves less
trivial algorithms and a more sophisticated API.

A "rational" type that limits denominators (presumably by rounding)
isn't needed -- we alreay have Decimal. The proposed rational type is
meant for "pure" mathematical uses, just like Python's long ints.

Regarding float->rational, I propose to refuse Rational(1.1) for the
same reasons as Decimal(1.1) is refused, but to add a separate
constructor (a class method?) that converts a float to a rational
exactly (as long as it isn't nan or inf), large denominators be
damned. This can help people who are interested in taking floating
point numbers apart.

float(Rational()) should be fine.

jyasskin · 2008-01-10T01:23:21Z

Follow the decimal module's lead in assiduously avoiding
float->rational conversions. Something like Rat.from_float(1.1) is
likely to produce unexpected results (like locking in an inexact input
and having an unexpectedly large denominator).

I agree that it's not usually what people ought to use, and I don't
think it's an essential part of the API. It might be a useful starting
point for the approximation methods though. .trim() and .approximate()
in http://svn.python.org/view/sandbox/trunk/rational/Rational.py?rev=40988&view=auto
and Haskell's approxRational
(http://www.haskell.org/ghc/docs/latest/html/libraries/base/src/Data-Ratio.html)
start from rationals instead of floats. On the other hand, it might
make sense to provide explicit methods to approximate from floats
instead of asking people to execute the two-phase process. I'm happy
to give it a different name or drop it entirely.

Likewise, follow decimal's lead in avoiding all automatic coercisions
from floats: Rational(3,10) + 0.3 for example. The two don't mix.

I'm reluctant to remove the fallback to float, unless the consensus is
that it's always a bad idea to support mixed-mode arithmetic (which is
possible: I was surprised by the loss of precision of "10**23/1" while
writing this). Part of the purpose of this class is to demonstrate how
to implement cross-type operations. Note that it is an automatic
coercion _to_ floats, so it doesn't look like you've gotten magic
extra precision.

Consider following decimal's lead on having a context that can limit
the maximum size of a denominator. There are various strategies for
handling a limit overflow including raising an exception or finding the
nearest rational upto the max denominator (there are some excellent
though complex published algorithms for this) or rounding the nearest
fixed-base (very easy). I'll dig out my HP calculator manuals at some
point -- they had worked-out a number of challenges with fractional
arithmetic (including their own version of an Inexact tag).

Good idea, but I'd like to punt that to a later revision if you don't
mind. If we do punt, that'll force the default context to be "infinite
precision" but won't prevent us from adding explicit contexts. Do you
see any problems with that?

Consider adding Decimal.from_rational and Rational.from_decimal. I
believe these are both easy and can be done losslessly.

Decimal.from_rational(Rat(1, 3)) wouldn't be lossless, but
Rational.from_decimal is easier than from_float. Then
Decimal.from_rational() could rely on just numbers.Rational, so it
would be independent of this module. Is that a method you'd want on
Decimal anyway? The question becomes whether we want the rational to
import decimal to implement the typecheck, or just assume that
.as_tuple() does the right thing. These are just as optional as
.from_float() though, so we can also leave them for future
consideration.

Automatic coercions to/from Decimal need to respect the active decimal
context. For example the result of Rational(3,11) +
Decimal('3.1415926') is context dependent and may not be commutative.

Since I don't have any tests for that, I don't know whether it works.
I suspect it currently returns a float! :) What do you want it to do?
Unfortunately, giving it any special behavior reduces the value of the
class as an example of falling back to floats, but that shouldn't
necessarily stop us from making it do the right thing.

When in doubt, keep the API minimal so we don't lock-in design mistakes.

Absolutely!

Test the API by taking a few numerical algorithms and seeing how well
they work with rational inputs (for starters, try
http://docs.python.org/lib/decimal-recipes.html ).

Good idea. I'll add some of those to the test suite.

If you do put in a method that accepts floats, make sure that it can
accept arguments to control the rational approximation. Ideally, you
would get something something like this Rational.approximate(math.pi, 6)
--> 355/113 that could produce the smallest rationalal approximation to
a given level of accuracy.

Right. My initial plan was to use
Rational.from_float(math.pi).simplest_fraction_within(Rational(1,
10**6)) but I'm not set on that, and, because there are several
choices for the approximation method, I'm skeptical whether it should
go into the initial revision at all.

rhettinger · 2008-01-10T01:58:43Z

Decimal.from_rational(Rat(1, 3)) wouldn't be lossless

It should be implemented as Decimal(1)/Decimal(3) and then let the
context handle any inexactness.

Rational.from_decimal is easier than from_float.

Yes. Just use Decimal.as_tuple() to get the integer components.

Then Decimal.from_rational() could rely on just numbers.
Rational, so it would be independent of this module.
Is that a method you'd want on Decimal anyway?

Instead, just expand the decimal constructor to accept a rational input.

Regarding float->rational, I propose to refuse Rational(1.1)
for the same reasons as Decimal(1.1) is refused,

+1

but to add a separate constructor (a class method?) that
converts a float to a rational exactly (as long as it
isn't nan or inf), large denominators be damned. This
can help people who are interested in taking floating
point numbers apart.

Here's how to pick the integer components out of a float:

mant, exp = math.frexp(x)
mant, exp = int(mant * 2.0 ** 53), exp-53

> * Likewise, follow decimal's lead in avoiding all
> automatic coercions from floats:
> Rational(3,10) + 0.3 for example. The two don't mix.

I'm reluctant to remove the fallback to float,

You will need some plan to scale-down long integers than exceed the
range of floats (right shifting the numerator and denominator until they
fit).

rhettinger · 2008-01-10T02:01:12Z

One other thought, the Decimal and Rational constructors can be made to
talk to each other via a magic method so that neither has to import the
other (somewhat like we do now with __int__ and __float__).

mdickinson · 2008-01-10T03:51:15Z

Allowing automatic coercion to and from Decimal sounds dangerous and
complicated to me. Mightn't it be safer just to disallow this (at least for
now)?

I think something like trim() (find the closest rational approximation with
denominator bounded by a given integer) would be useful to have as a Rational
method, but probably only for explicit use.

I'm still worried about equality tests: is it possible to give a good reason
why Decimal("2.5") == Rational(5, 2) should return False, while Rational(5, 2)
== 2.5 returns True. Can someone articulate some workable principle that
dictates when an int/float/complex/Rational/Decimal instance compares equal to
some other int/float/complex/Rational/Decimal instance of possibly different
type but the same numerical value?

gvanrossum · 2008-01-10T04:35:36Z

All in all, Decimal is the odd man out -- it may never become a full
member of the Real ABC. The built-in types complex, float and int (and
long in 2.x) all support mixed-mode arithmetic, and this is one of the
assumptions of the numeric tower -- and of course of mathematics. The
new Rational type can be designed to fit in this system pretty easily,
because it has no pre-existing constraints; but the Decimal type
defies coercion, and is perhaps best left alone. It's already breaking
the commonly understood properties of equality, e.g. 1.0 == 1 ==
Decimal("1") != 1.0.

--Guido

On Jan 9, 2008 7:51 PM, Mark Dickinson <report@bugs.python.org> wrote:

Mark Dickinson added the comment:

Allowing automatic coercion to and from Decimal sounds dangerous and
complicated to me. Mightn't it be safer just to disallow this (at least for
now)?

I think something like trim() (find the closest rational approximation with
denominator bounded by a given integer) would be useful to have as a Rational
method, but probably only for explicit use.

I'm still worried about equality tests: is it possible to give a good reason
why Decimal("2.5") == Rational(5, 2) should return False, while Rational(5, 2)
== 2.5 returns True. Can someone articulate some workable principle that
dictates when an int/float/complex/Rational/Decimal instance compares equal to
some other int/float/complex/Rational/Decimal instance of possibly different
type but the same numerical value?

Tracker <report@bugs.python.org>
<http://bugs.python.org/issue1682\>

rhettinger · 2008-01-10T05:18:50Z

Would it be reasonable then to not have any of the numeric tower stuff
put in the decimal module -- basically leave it alone (no __ceil__,
etc)?

gvanrossum · 2008-01-10T16:10:52Z

Would it be reasonable then to not have any of the numeric tower stuff
put in the decimal module -- basically leave it alone (no __ceil__,
etc)?

If that's the preference of the decimal developers, sure.

jyasskin · 2008-01-11T20:50:32Z

If the consensus is that Decimal should not implement Real, I'll reply
to that thread and withdraw the patch.

Raymond, do you want me to add Decimal.__init__(Rational) in this patch
or another issue/thread?

I don't understand the comment about scaling down long integers. It's
already the case that float(Rational(10**23, 10**24 + 7))==0.1.

Mark, I agree that .trim() and/or .approximate() (which I think is the
same as Haskell's approxRational) would be really useful. Do you have
particular reasons to pick .trim? Are those the best names for the
concepts? I'd also really like to be able to link from their docstrings
to a proof that their implementations are correct. Does anyone know of one?

Finally, Decimal("2.5") != Rational(5, 2) because Decimal("2.5") != 2.5
(so it'd make equality even more intransitive) and hash(Decimal("2.5"))
!= hash(2.5) so we couldn't follow the rule about equal objects implying
equal hash codes, which is probably more serious. I don't have a
principled explanation for Decimal's behavior, but given that it's
fixed, I think the behavior of non-integral Rationals is determined too.
On the other hand, I currently have a bug where Rational(3,1) !=
Decimal("3") where the hash code could be consistent. I'll fix that by
the next patch.

rhettinger · 2008-01-11T21:26:05Z

If the consensus is that Decimal should not implement Real,
I'll reply to that thread and withdraw the patch.

Thanks. That would be nice.

Raymond, do you want me to add Decimal.__init__(Rational)
in this patch

How about focusing on the rational module and when you've done, I'll
adapt the Decimal constructor to accept a rational input.

I don't understand the comment about scaling down long integers.

My understanding is that you're going to let numerators and denominators
grow arbitrarily large. When they get over several hundred digits each,
you will have to scale the down before converting to a float. For
example when numerator=long('2'*400+'7') and
denominator=long('3'*400+'1'), the long-->float conversion will
overflow, so it is necessary to first scale-down the two before
computing the ratio: scale=325;
float_ratio=float(numerator>>scale)/float(denominator>>scale)

mdickinson · 2008-01-12T01:40:06Z

More comments, questions, and suggestions on the latest patch:

In _binary_float_to_ratio, the indentation needs adjusting. Also 'top = 0L' could be replaced with 'top =
0', unless you need this code to work with Python 2.3 and earlier, in which case top needs to be a long so
that << behaves correctly. Otherwise, this looks good.
Rational() should work, and it should be possible to initialize from a string. I'd suggest that
Rational(' 1729 '), Rational('-3/4') and (' +7/18 \n') should be acceptable: i.e. leading and trailing
whitespace, and an optional - or + sign should be permitted. But space between the optional sign and the
numerator, or on either side of the / sign, should be disallowed. Not sure whether the numerator and
denominator should be allowed to have leading zeros or not---probably yes, for consistency with int().
I don't think representing +/-inf and nan as Rationals (1/0, -1/0, 0/0) is a good idea---special values
should be kept out of the Rational type, else it won't be an implementation of the Rationals any more---it'll
be something else.
hash(n) doesn't agree with hash(Rational(n)) for large integers (I think you already mentioned this
above).
Equality still doesn't work for complex numbers:

>>> from rational import *
>>> Rational(10**23) == complex(10**23)  # expect False here
True
>>> Rational(10**23) == float(10**23)
False
>>> float(10**23) == complex(10**23)
True

Why the parentheses around the str() of a Rational?
How about having hash of a Rational (in the case that it's not equal to an integer or a float) be
given by hash((self.numerator, self.denominator))? That is, let the tuple hash take care of avoiding lots of
hash collisions.

mdickinson · 2008-01-12T01:47:45Z

About .trim and .approximate:

it sounds like these are different, but quite closely related, methods: one takes a positive
integer and returns the best approximation with denominator bounded by that integer; the other
returns the 'smallest' rational in a given interval centered at the original rational. I guess
we probably don't need both of these, but I can't give any good reason for preferring one over
the other.

I don't have anything to offer about names, either.

I can try to find out whether the algorithms are published anywhere on the web---certainly,
neither algorithm should be particularly hard to implement and prove the correctness of; they
both essentially rely on computing the continued fraction development of the given rational.
Almost any not-too-basic elementary number theory text should contain proofs of the relevant
results about continued fractions.

Am willing to help out with implementing either of these, if that's at all useful.

jyasskin · 2008-01-13T22:57:30Z

_binary_float_to_ratio: Oops, fixed.

Rational() now equals 0, but I'd like to postpone Rational('3/2') until
there's a demonstrated need. I don't think it's as common a use as
int('3'), and there's more than one possible format, so some real world
experience will help (that is, quite possibly not in 2.6/3.0). I'm also
postponing Rational(instance_of_numbers_Rational).

+/-inf and nan are gone, and hash is fixed, at least until the next bug.
:) Good idea about using tuple.

Parentheses in str() help with reading things like
"%s**2"%Rational(3,2), which would otherwise format as "3/2**2". I don't
feel strongly about this.

Equality and the comparisons now work for complex, but their
implementations make me uncomfortable. In particular, two instances of
different Real types may compare unequal to the nearest float, but equal
to each other and have similar but inconsistent problems with <=. I can
trade off between false ==s and false !=s, but I don't see a way to make
everything correct. We could do better by making the intermediate
representation Rational instead of float, but comparisons are inherently
doomed to run up against the fact that equality is uncomputable on the
computable reals, so it's probably not worthwhile to spend too much time
on this.

I've added a test that float(Rational(long('2'*400+'7'),
long('3'*400+'1'))) returns 2.0/3. This works without any explicit
scaling on my part because long.__truediv__ already handles it. If
there's something else I'm doing wrong around here, I'd appreciate a
failing test case.

The open issues I know of are:

Is it a good idea to have both numbers.Rational and
rational.Rational? Should this class have a different name?
trim and approximate: Let's postpone them to a separate patch (I do
think at least one is worth including in 2.6+3.0). So that you don't
waste time on them, we already have implementations in the sandbox and
(probably) a good-enough explanation at
http://en.wikipedia.org/wiki/Continued_fraction#Best_rational_approximations.
Thanks for the offer to help out with them. :)
Should Rational.from_float() exist and with the current name? If
there's any disagreement, I propose to rename it to
Rational._from_float() to discourage use until there's more consensus.
Rational.from_decimal(): punted to a future patch. I favor this for
2.6+3.0.
Rational('3/2') (see above)

I think this is close enough to correct to submit and fix up the
remaining problems in subsequent patches. If you agree, I'll do so.

mdickinson · 2008-01-14T01:45:47Z

The latest version of rational.py looks good to me---nice work! (I haven't looked properly at
numbers.py or test_rational.py, yet. I do plan to, eventually.)

I do think it's important to be able to create Rational instances from strings: e.g., for
easy reading from and writing to files. But maybe I'm alone in this opinion. You say there's
more than one possible format---what other formats were you considering?

And since you pointed it out, I think Rational(Rational(a, b)) should work too.

There's also the not-entirely-insignificant matter of documentation :)

Other than that, I don't see why this shouldn't go in.

Other comments:

I have a weak preference for no parentheses on the str() of a Rational, but it's no big deal
either way.

I agree that equality and comparisons are messy. This seems almost inevitable: one obvious
cause is that the existing int <-> float comparisons already break the `numeric tower' model
(push both operands to the highest common type before operating). So I'm not sure there can
be an easy and elegant solution here :(

I like the name Rational for this class. Maybe change the name of numbers.Rational instead?

Postponing trim, approximate, from_decimal sounds fine to me.

Finally: the very first line of rational.py is, I think, no longer accurate. Please add your
name so everyone knows who to blame/credit/assign bug reports to :)

mdickinson · 2008-01-14T03:07:30Z

Just had a quick look at numbers.py. Only two comments:

I think there's a typo in the docstring for Inexact: one of those == should be a !=
Not that it really matters now, but note that at least for Decimal, x-y is not the same
as x+(-y) (I'm looking at Complex.__sub__), and +x is not a no-op (Real.real,
Real.conjugate). In both cases signs of zeros can be problematic:

>>> x = Decimal('-0')
>>> y = Decimal('0')
>>> x-y
Decimal("-0")
>>> x+(-y)
Decimal("0")
>>> x
Decimal("-0")
>>> +x
Decimal("0")

Of course the first point wouldn't matter anyway since Decimal already implements __sub__;
the second means that if Decimal were to join Real, something would need to be done to
avoid Decimal("-0").real becoming Decimal("0").

ncoghlan · 2008-02-12T22:39:42Z

Thanks for adding the class methods back Mark.

On the constructor front, we got a fair payoff in the early Decimal days
just by reordering the type checks in the constructor. Other than that,
I'd have to dig into the code a bit more to offer any useful suggestions.

jyasskin · 2008-02-14T07:21:58Z

r60785 speeds the benchmark up from 10.536s to 4.910s (on top of
whatever my __instancecheck__ fix did). Here are the remaining
interesting-looking calls:

ncalls tottime percall cumtime percall filename:lineno(function)
...
1 0.207 0.207 4.999 4.999 {sum}
199996 1.587 0.000 3.419 0.000 fractions.py:58(new)
99997 0.170 0.000 3.236 0.000 fractions.py:295(forward)
99998 0.641 0.000 2.881 0.000 fractions.py:322(add)
99999 0.202 0.000 1.556 0.000 benchmark.py:3(<genexpr>)
199996 0.829 0.000 0.829 0.000 fractions.py:17(gcd)
199996 0.477 0.000 0.757 0.000 abc.py:63(new)
399993 0.246 0.000 0.246 0.000 abc.py:164(instancecheck)
199996 0.207 0.000 0.207 0.000 {method 'get' of
'dictproxy' objects}
100071 0.185 0.000 0.185 0.000 {isinstance}
399990 0.109 0.000 0.109 0.000 fractions.py:200(denominator)
200004 0.073 0.000 0.073 0.000 {built-in method __new_
of type object at 0xfeea0}
199995 0.065 0.000 0.065 0.000 fractions.py:196(numerator)

gvanrossum · 2008-02-14T16:58:46Z

Yay! And my benchmark went from 70 usec to 15 usec. Not bad!

PS. Never use relative speedup numbers based on profiled code -- the
profiler skews things tremendously. Not saying you did, just stating
the obvious. :)

gvanrossum · 2008-02-14T17:08:54Z

PS. I can shave off nearly 4 usec of the constructor like this:

-        self = super(Fraction, cls).__new__(cls)
+        if cls is Fraction:
+            self = object.__new__(cls)
+        else:
+            self = super().__new__(cls)

This would seem to give an upper bound for the gain you can make by
moving the check for instantiating an abstract class to C. Your call.

I also found that F(2,3)+F(5,7) takes about 22 usecs (or 18 using the
above hack).

Inlining the common case for addition (Fraction+Fraction) made that go
down to 14 usec (combined with the above hack):

-    __add__, __radd__ = _operator_fallbacks(_add, operator.add)
+    __xadd__, __radd__ = _operator_fallbacks(_add, operator.add)
 
+    def __add__(self, other):
+        if type(other) is Fraction:
+            na = self._numerator
+            da = self._denominator
+            nb = other._numerator
+            db = other._denominator
+            return Fraction(na*db + nb*da, da*db)
+        return self.__xadd__(other)
+

jyasskin · 2008-02-14T17:47:18Z

Thanks for writing the __add__ optimization down. I hadn't realized how
simple it was. I think both optimizations are worth it. 22% on a rarely
used class is worth 24 lines of python, and I think nearly eliminating
the overhead of ABCs (which will prevent them from getting a bad
performance reputation) is worth some C.

mdickinson · 2008-02-18T22:15:41Z

Two things:

(1) Speedup. I haven't been helping much here; apologies. Christian
suggested that a C implementation of gcd might help. Is this true, or
are we not yet at the stage where the gcd calls are significant? There
are some basic tricks that can speed up the usual Euclidean algorithm by
a constant factor, and I'll try to implement them if that's of interest
(unless Christian beats me to it). There also some serious super-fast
recursive gcd algorithms, with running time O(n**(1+epsilon)) for a pair
of n-bit integers, but those are complicated and probably best left to
GMP.

(2) PEP-3101: Should Fraction get a __format__ method? I'm looking at
implementing Decimal.__format__, and I think there's quite likely to be
a fair amount of common code (e.g. for parsing the format specifier)
between Decimal.__format__ and Fraction.__format__, so it would probably
make sense for me to do both of these at once.

jyasskin · 2008-02-18T22:51:05Z

No worries. Even without inlining the common case of __add__, etc.,
Fraction is now faster than Decimal for smallish fractions
[sum(Fraction(1, i) for i in range(1, 100))], and for large ones
[sum(Fraction(1, i) for i in range(1, 1000))] gcd takes so much of the
time that I can't see the effects of any of the optimizations I've made.
Since I wasn't thinking of this as a high-performance class, I'm taking
that as a sign to stop optimizing, but gcd is definitely the function to
tackle if anyone wants to keep going.
I haven't been following __format__, but adding it to Rational sounds
fine to me.

gvanrossum · 2008-02-19T16:38:21Z

A C reimplementation of gcd probably makes little sense -- for large
numbers it is dominated by the cost of the arithmetic, and that will
remain the same.

mdickinson · 2008-02-19T17:03:29Z

A C reimplementation of gcd probably makes little sense -- for large
numbers it is dominated by the cost of the arithmetic, and that will
remain the same.

That's true for a direct translation of the usual algorithm. But
there's a variant due to Lehmer which reduces the number of multi-
precision operations, and dramatically reduces the number of full-
precision divmod operations---they're mostly replaced by n-limb-by-1-
limb multiplications and full-precision additions. I'd expect at least
a factor of 2 speedup from using this; possibly more. Of course it's
impossible to tell without implementing it.

I've attached a Python version; note that:

the operations to find x and y at the start of the outer while loop
are simple lookups when written in C
the else branch is taken rarely: usually once at the beginning of
any gcd() calculation, and then less than 1 time in 20000 on average
during the reduction.
all the arithmetic in the inner while loop is done with numbers <=
2**15 in absolute value.

Mark

gvanrossum · 2008-02-19T17:20:01Z

I think this is definitely premature optimization. :-)

In any case, before going any further, you should design a benchmark
and defend it.

mdickinson · 2008-02-19T19:31:24Z

Replacing lehmer_gcd.py with a revised version. Even in Python, this
version is faster than the current one, on my machine, once both numbers
are greater than 10**650 or so (your crossover points may vary). It's
over four times faster for very large inputs (over 10**20000).

In any case, before going any further, you should design a benchmark
and defend it.

Okay. I'll stop now :)

mdickinson · 2008-02-22T01:02:53Z

Okay. I'll stop now :)

So I lied. In a spirit of enquiry, I implemented math.gcd, and wrote a
script (lehmer_gcd.py, attached) to produce some relative timings. Here
are the results, on my MacBook Pro (OS X 10.5):

An explanation of the table: I'm computing gcd(a, b) for randomly
chosen a and b with a_digits and b_digits decimal digits, respectively.
The times in the 5th and 6th columns are in seconds; the last column
gives the factor by which math.gcd is faster than the direct Euclidean
Python implementation.

Even for quite small a and b, math.gcd(a, b) is around 10 times faster
than its Python counterpart. For large, roughly equal-sized, a and b,
the C version runs over 30 times faster.

The smallest difference occurs when the two arguments are very different
sizes; then both versions of gcd essentially do one time-consuming
divmod, followed by a handful of tiny operations, so they take
essentially the same time. For Fraction, the arguments will tend to be
around equal size for many applications: this assumes that the actual
real number represented by the Fraction is within a sensible range, even
when the numerator and denominator have grown huge.

For random a and b, gcd(a, b) tends to be very small. So for the last
few entries in the table, the gcds have been artifically inflated (by
calling gcd(g*a, g*b) for some largish g).

On to the results.

a_digits b_digits g_digits ntrials C_Lehmer Py_Euclid Factor
-------- -------- -------- ------- -------- --------- ------
1 1 0 100000 0.066856 0.240867 3.602780
2 2 0 100000 0.070256 0.314639 4.478466
4 4 0 100000 0.075708 0.466596 6.163087
7 7 0 100000 0.082714 0.681443 8.238537
10 10 0 10000 0.022336 0.318501 14.259532
20 20 0 10000 0.045209 0.752662 16.648436
50 50 0 10000 0.128269 2.167890 16.901128
100 100 0 1000 0.028053 0.511769 18.242906
1000 1000 0 1000 0.709453 18.867336 26.594197
10000 10000 0 100 4.215795 148.597223 35.247736

Lopsided gcds
-------------
20 100 0 100 0.000889 0.007659 8.616953
100 20 0 100 0.000887 0.007665 8.642473
1000 3 0 100 0.000727 0.001387 1.908167
3 1000 0 100 0.000754 0.001457 1.932027
10000 1 0 100 0.004689 0.004948 1.055219
1 10000 0 100 0.004691 0.005038 1.073948
500 400 0 100 0.020289 0.388080 19.127665
400 500 0 100 0.020271 0.389301 19.204768

Large gcd
---------
10 10 10 1000 0.005235 0.043507 8.310880
20 20 20 1000 0.008505 0.095732 11.256167
100 100 100 1000 0.041734 0.812656 19.472284

mdickinson · 2008-02-22T01:05:10Z

And here's the diff for math.gcd. It's not production quality---it's just
there to show what's possible.

mdickinson · 2008-02-28T02:16:21Z

I'm wondering what there is left to do here. Are we close to being able
to close this issue? Some thoughts:

(1) I think the docs could use a little work (just expanding, and giving
a few examples). But it doesn't seem urgent that this gets done before
the next round of alphas.

(2) Looking at it, I'm really not sure that implementing
Rational.__format__ is worthwhile; it's not obvious that we need a way
to format a Rational as a float. So I'm -0 on this. If others think
Rational should have a __format__ method I'm still happy to implement
it.

Is there much else that needs to be done here?

jyasskin · 2008-02-29T06:37:25Z

I agree that we're basically done here. I'm back to -1 on inlining the
common case for arithmetic (attached here anyway). Simple cases are
already pretty fast, and bigger fractions are dominated by gcd time, not
function call overhead. Since duplicating the definitions of arithmetic
will make it harder to do tricky things that shrink the arguments to
gcd(), we shouldn't do it.

I was always +0 to __format__, so not doing it is fine with me.

Thanks for everyone's help!

mdickinson · 2008-06-25T12:11:09Z

I'm reopening this to propose a last-minute minor change:

Can we remove the trailing 'L' from the numerator and denominator in
the repr of a Fraction instance? E.g.,

>>> x = Fraction('9.876543210')
>>> x
Fraction(987654321L, 100000000L)
>>> x * (1/x)
Fraction(1L, 1L)

I'd rather see Fraction(987654321, 100000000) and Fraction(1, 1). Does
the 'L' serve any useful purpose?

rhettinger · 2008-06-25T15:57:23Z

+1 on removing the L. Also, it would be nice if reduced fractions were
restored to ints after the gcd step:

>>> Fraction(1*10**18, 3*10**18)
Fraction(1L, 3L)

gvanrossum · 2008-06-25T20:52:47Z

+1 on removing the trailing L from the repr.

+0 on trying to reduce the values to ints; that would be dead-end code
since in 3.0 it's a non-issue. And it doesn't solve the problem with repr.

mdickinson · 2008-06-27T10:21:03Z

Trailing 'L's removed in r64557.

A couple more minor issues:

(1) fractions.gcd is exported but not documented. I'll add some
documentation for it, unless anyone feels that it shouldn't have been
exported in the first place. If so, please speak up!

(2) The Fraction constructor is a little more lenient than it ought to
be. For example:

>>> Fraction('1.23', 1)
Fraction(123, 100)

I think this should really be a TypeError: a second argument to the
constructor should only be permitted when the first argument is an
integer. However, it seems fairly harmless, so I'm inclined to just
leave this as it is and not mess with it.

mdickinson · 2008-06-27T10:26:42Z

Hmm. I don't much like this, though:

>>> Fraction('1.23', 1.0)
Fraction(123, 100)
>>> Fraction('1.23', 1+0j)
Fraction(123, 100)

I'd say these should definitely be TypeErrors.

gvanrossum · 2008-06-27T18:51:08Z

I think it's okay to accept Fraction('1.23', 1) -- if only because
rejecting it means changing the default for the 2nd arg to None and that
would slow it down again.

I agree that if the type of the 2nd arg isn't int or long it should be
rejected. That should not slow down the common path (two ints).

mdickinson · 2008-06-27T20:36:20Z

I agree that if the type of the 2nd arg isn't int or long it should be
rejected. That should not slow down the common path (two ints).

Sounds good. Some care might be needed to avoid invalidating
Fraction(n, d) where n and d are instances of numbers.Integral but not of
type int or long.

mdickinson · 2008-06-30T21:37:36Z

I agree that if the type of the 2nd arg isn't int or long it should be
rejected. That should not slow down the common path (two ints).

I'm having second thoughts about this; it doesn't seem worth adding an
extra check that has to be applied every time a Fraction is created from a
string (or another Rational instance). The condition being checked for (a
denominator equal to 1, but not of type int or long) is unlikely to occur
in practice, and fairly harmless even if it does occur. So I'm thinking
that PBP says leave it alone.

gvanrossum · 2008-06-30T23:23:18Z

That's fine.

mdickinson · 2008-07-01T20:35:22Z

Well, I can't find anything more to fuss about here. :-)

Reclosing.

jyasskin mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 21, 2007

gvanrossum self-assigned this Dec 21, 2007

jyasskin mannequin closed this as completed Feb 29, 2008

mdickinson reopened this Jun 25, 2008

mdickinson closed this as completed Jul 1, 2008

ezio-melotti transferred this issue from another repository Apr 10, 2022

skirpichev mentioned this issue Jan 8, 2023

gh-91851: Micro optimizations for Fraction's arithmetic #25518

Merged

Move Demo/classes/Rat.py to Lib/fractions.py and fix it up. #46023

Move Demo/classes/Rat.py to Lib/fractions.py and fix it up. #46023

Comments

jyasskin mannequin commented Dec 21, 2007

jyasskin mannequin commented Dec 21, 2007

mdickinson commented Jan 7, 2008

mdickinson commented Jan 7, 2008

jyasskin mannequin commented Jan 9, 2008

rhettinger commented Jan 9, 2008

gvanrossum commented Jan 9, 2008

rhettinger commented Jan 9, 2008

facundobatista commented Jan 9, 2008

gvanrossum commented Jan 9, 2008

rhettinger commented Jan 10, 2008

facundobatista commented Jan 10, 2008

rhettinger commented Jan 10, 2008

gvanrossum commented Jan 10, 2008

jyasskin mannequin commented Jan 10, 2008

rhettinger commented Jan 10, 2008

rhettinger commented Jan 10, 2008

mdickinson commented Jan 10, 2008

gvanrossum commented Jan 10, 2008

rhettinger commented Jan 10, 2008

gvanrossum commented Jan 10, 2008

jyasskin mannequin commented Jan 11, 2008

rhettinger commented Jan 11, 2008

mdickinson commented Jan 12, 2008

mdickinson commented Jan 12, 2008

jyasskin mannequin commented Jan 13, 2008

mdickinson commented Jan 14, 2008

mdickinson commented Jan 14, 2008

ncoghlan commented Feb 12, 2008

jyasskin mannequin commented Feb 14, 2008

gvanrossum commented Feb 14, 2008

gvanrossum commented Feb 14, 2008

jyasskin mannequin commented Feb 14, 2008

mdickinson commented Feb 18, 2008

jyasskin mannequin commented Feb 18, 2008

gvanrossum commented Feb 19, 2008

mdickinson commented Feb 19, 2008

gvanrossum commented Feb 19, 2008

mdickinson commented Feb 19, 2008

mdickinson commented Feb 22, 2008

mdickinson commented Feb 22, 2008

mdickinson commented Feb 28, 2008

jyasskin mannequin commented Feb 29, 2008

mdickinson commented Jun 25, 2008

rhettinger commented Jun 25, 2008

gvanrossum commented Jun 25, 2008

mdickinson commented Jun 27, 2008

mdickinson commented Jun 27, 2008

gvanrossum commented Jun 27, 2008

mdickinson commented Jun 27, 2008

mdickinson commented Jun 30, 2008

gvanrossum commented Jun 30, 2008

mdickinson commented Jul 1, 2008