Issue 513866: Float/long comparison anomaly

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/36040

classification

Title:	Float/long comparison anomaly
Type:		Stage:
Components:	Interpreter Core	Versions:

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	tim.peters	Nosy List:	arkoenig, phr, tim.peters
Priority:	normal	Keywords:

Created on 2002-02-06 18:33 by arkoenig, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
bug.py	arkoenig, 2002-02-06 18:33	program that reproduces the problem

Messages (14)
msg9154 - (view)	Author: Andrew Koenig (arkoenig)	Date: 2002-02-06 18:33
Comparing a float and a long appears to convert the long to float and then compare the two floats. This strategy is a problem because the conversion might lose precision. As a result, == is not an equivalence relation and < is not an order relation. For example, it is possible to create three numbers a, b, and c such that a==b, b==c, and a!=c.
msg9155 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-02-07 04:52
Logged In: YES user_id=31435 Since the coercion to float is documented and intended, it's not "a bug" (it's functioning as designed), although you may wish to argue for a different design, in which case making an incompatible change would first require a PEP and community debate. Information loss in operations involving floats comes with the territory, and I don't see a reason to single this particular case out as especially surprising. OTOH, I expect it would be especially surprising to a majority of users if the implicit coercion in somefloat == somelong could lead to a different result than the explicit coercion in long(somefloat) == somelong Note that the "long" type isn't unique here: the same is true of mixing Python ints with Python floats on boxes where C longs have more bits of precision than C doubles (e.g., Linux for IA64, and Crays).
msg9156 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-02-07 04:59
Logged In: YES user_id=31435 Oops! I meant """ could lead to a different result than the explicit coercion in somefloat == float(somelong) """
msg9157 - (view)	Author: Andrew Koenig (arkoenig)	Date: 2002-02-07 13:28
Logged In: YES user_id=418174 The difficulty is that as defined, < is not an order relation, because there exist values a, b, c such that a<b, b==c, and a==c. I believe that there also exist values such that a<b, b<c, and a==c. Under such circumstances, it is hard to understand how sort can work properly, whicn is my real concern. Do you really want to warn people that they shouldn't sort lists containing floats and longs? Moreover, it is not terribly difficult to define the comparisons so that == is an equivalence relation and < is an order relation. The idea is that for any floating-point system, there is a threshold T such that if x is a float value >=T, converting x to long will not lose information, and if x is a long value <=T, converting x to float will not lose information. Therefore, instead of always converting to long, it suffices to convert in a direction chosen by comparing the operands to T (without conversion) first.
msg9158 - (view)	Author: Andrew Koenig (arkoenig)	Date: 2002-02-07 15:33
Logged In: YES user_id=418174 Here is yet another surprise: x=[1L<10000] y=[0.0] z=x+y Now I can execute x.sort() and y.sort() successfully, but z.sort blows up.
msg9159 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-02-09 07:33
Logged In: YES user_id=31435 I reopened this, but unassigned it since I can't justify working on it (the benefit/cost ratio of fixing it is down in the noise compared to other things that should be done). I no longer think we'd need a PEP to change the behavior, and agree it would be nice to change it. Changing it may surprise people expecting Python to work like C (C99 says that when integral -> floating conversion is in range but can't be done exactly, either of the closest representable floating numbers may be returned; Python inherits the platform C's behavior here for Python int -> Python float conversion (C long -> C double); when the conversion is out of range, C doesn't define what happens, and Python inherits that too before 2.2 (Infinities and NaNs are what I've seen most often, varying by platform); in 2.2 it raises OverflowError). I'm not sure it's possible for a<b and b<c and a==c, unless the platform C is inconsistent (meaning that (double)i for a fixed i returns the next-lowest double on some calls but the next-higher on others). This brute-force searcher didn't turn up any examples on my box: f = 2L**53 - 5 # straddle the representable-as-double limit nums = [f+i for i in range(50)] nums.extend(map(float, nums)) for a in nums: . for b in nums: . if not a < b: . continue . for c in nums: . if not b < c: . continue . if a >= c: . print `a`, `b`, `c`
msg9160 - (view)	Author: Andrew Koenig (arkoenig)	Date: 2002-02-09 15:42
Logged In: YES user_id=418174 I completely agree it's not a high-priority item, especially because it may be complicated to fix. I think that the fundamental problem is that there is no common type to which both float and long can be converted without losing information, which complicates both the definition and implementation of comparison. Accordingly, it might make sense to think about this issue in conjunction with future consideration of rational numbers.
msg9161 - (view)	Author: paul rubin (phr)	Date: 2002-02-17 14:43
Logged In: YES user_id=72053 I hope there's a simple solution to this--it's obvious what the right result should be mathematically if you compare 1L<<10000 with 0.0. It should not raise an error. If the documented behavior leads to raising an error, then there's a bug in the document. I agree that it's not the highest priority bug in the world, but it doesn't seem that complicated. If n is a long and x is a float, both >= 0, what happens if you do this, to implement cmp(n,x): xl = long(x) # if x has a fraction part and int part is == n, then x>n if float(xl)!=x and xl==n: return 1 return cmp(n, xl) If both are < 0, change 1 to -1 above. If x and n are of opposite sign, the positive one is greater. Unless I missed something (which is possible--I'm not too alert right now) the above should be ok in all cases. Basically you use long as the common type to convert to; you do lose information when converting a non-integer, but for the comparison with an integer, you don't need the lost information other than knowing whether it was nonzero, which you find out by converting the long back to a float.
msg9162 - (view)	Author: paul rubin (phr)	Date: 2002-02-17 14:58
Logged In: YES user_id=72053 Oops, I got confused about the order of the two args in the example below. I meant cmp(x,n) in the description and cmp(xl, n) in the code, rather than having n first. Anyway you get the idea. Now I should go back to bed ;-).
msg9163 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-02-17 15:31
Logged In: YES user_id=31435 Paul, this isn't an intellectual challenge -- I expect any numerical programmer of ordinary skill could write code to compare a float to a long delivering the mathematically sensible result. There are several ways to do it. Adding "me too" votes doesn't change the priority. How about taking a whack at writing a patch if this is important to you? It's so low on the list of PythonLabs priorities I doubt I'll ever get to it (which is why I unassigned myself: an unassigned bug report is looking for someone to fix it, not a cheerleader <wink>).
msg9164 - (view)	Author: Andrew Koenig (arkoenig)	Date: 2002-02-17 15:46
Logged In: YES user_id=418174 I think there's a slightly more straightforward algorithm than the one that Paul Rubin (phr) suggested. Again, assume that x is a float and n is a long. We note first that the comparison is trivial unless x and n are both nonzero and have the same sign. We will therefore assume in the rest of this discussion that x and n are strictly positive; the case where they are negative is analogous. Every floating-point implementation has many numbers with the property that the least significant bit in those numbers' representations has a value of 1. In general, if the floating-point representation has k bits, then any integer in the range [2(k-1),2k) qualifies. Let K be any of these numbers; it doesn't matter which one. Precompute K and store it in both float and long form. This computation is exact because K is an integer that has an exact representation in floating-point form. It is now possible to compare x with K and n with K exactly, without conversion, because we already have K exactly in both forms. If x < K and n >= K, then x < n and we're done. If x > K and n <= K, then x > n and we're done. Otherwise, x and n are on the same side of K (possibly being equal to K). If x >= K and n >= K, then the LSB of x is at least 1, so we can convert x to long without losing information. Therefore, cmp(x, n) is cmp(long(x), n). If x <= K and n <= K, then then n is small enough that it has an exact representation as a float. Therefore cmp(x, n) is cmp(x, float(n)). So I don't think there's any profound algorithmic problem here. Unfortunately, I don't know enough about the details of how comparison is implemented to be willing to try my hand at a patch.
msg9165 - (view)	Author: paul rubin (phr)	Date: 2002-02-17 16:22
Logged In: YES user_id=72053 It looks like the complication is not in finding an algorithm but rather in fitting it into the implementation. I'm not at all sure this is right, but glancing at the code, the comparison seems to happen in the function try_3way_compare in Objects/object.c, which calls PyNumber_CoerceEx if the types aren't equal. PyNumber_CoerceEx ends up calling float_coerce on x,n which "promotes" n to float, similar to what happens when you do mixed arithmetic (like x+n). My guess is that a suitable patch would go into try_3way_compare to specially notice when you're comparing a float and a long, and avoid the coercion. I'm unfamiliar enough with the implementation that I'd probably take a while to get it right, and still possibly end up forgetting to update a refcount or something, leading to leaked memory or mysterious crashes later. Anyway, no, this isn't real important to me, at least at the moment. It just wasn't clear whether there was any difficulty figuring out a useable algorithm.
msg9166 - (view)	Author: Tim Peters (tim.peters) *	Date: 2004-09-23 08:08
Logged In: YES user_id=31435 I'm on vacation, so I fixed this <wink>. The changes are so messy I doubt it will get backported to the 2.3 line, though (I'm not going to try -- someone else would need to volunteer for that). Lib/test/test_long.py 1.25 Misc/NEWS 1.1142 Objects/floatobject.c 2.133
msg9167 - (view)	Author: Andrew Koenig (arkoenig)	Date: 2004-10-02 04:16
Logged In: YES user_id=418174 Thank you! This is seriously cool...

History
Date	User	Action	Args
2022-04-10 16:04:57	admin	set	github: 36040
2002-02-06 18:33:25	arkoenig	create